RegreSQL: Regression Testing for PostgreSQL Queries

32 minutes ago by jitl

I don’t understand how you can test Postgres performance in CI or on a developer laptop. Until your tables are large and varied enough, Postgres can ignore indexes and prefer full table scans because it’s faster. Plans depend on statistics so your test data generator better output rows with the same distributions as you get in prod. Unless you have a large and representative load of concurrent queries, Postgres and filesystem caching can optimize around a single query shape, masking issues that will shit the bed in prod.

Example: I wrote a recursive query that provided a huge reduction in queries issued from our app for a traversal. Worked beautifully in local, and halved p95 in our dogfood environment. Yippee! In prod? The query always timed out after 60 seconds, even though it had the same query plan as dogfood env and staging. Sad trombone noises.

For query semantics regression testing, we just write tests in our normal test framework and run them in CI like any other test. Test data setup works like any other test. To prevent cross-test leakage we wrap each test in a BEGIN..ROLLBACK, and transform inner use of transactions to save points in our db client layer. I’d like to add libeatmydata to speed things up further but haven’t done so yet.

7 hours ago by jumski

Looks really well thought out and I will be testing it for sure!

I'm wondering how I would be able to regression-test functions in my project (pgflow [0]) - it tracks a graph of state machines modeled in few tables. State is mutated only by calling few exposed SQL functions (task queue worker does it).

Given I can't enforce everything I need with check constraints and I try to avoid triggers if possible, I opted for only using exposed SQL API [1] for setting up state in my pgTAP tests.

It is imperative and harder to maintain, like scripts you described in the article, but really my only option, as I want to have maximum confidence level.

Does RegreSQL support some kind of init scripts or I would need to wire it myself and just run RegreSQL after the proper state is set? Would lose the "run once and get report on everything" benefit then :-(

[0] https://pgflow.dev/ [1] https://github.com/pgflow-dev/pgflow/blob/main/pkgs/core/sup...

7 hours ago by radimm

At this point it supports initialization through the fixtures systems (like inline SQL or SQL files). At the moment they have fixed order, which might lead to some limitations, but I'm already thinking about some pre/post test setup hooks and full schema handling as well (for full schema reloads).

Plus I have whole set of other requirements where RegreSQL suddenly seems to be a good solution.

And without sounding cliche - Thank you for the comment! This is exactly why I forced myself to go public and get this level of feedback.

6 hours ago by jumski

No cliche at all - I'm in the same boat, showing my stuff online was way out of my comfort zone!

I was postponing proper, dedicated performance testing for some time and would really love to up my game in that regard.

I'm very happy with pgTAP approach of running stuff in transaction and rolling them back after the test - how this works in RegreSQL?

Would love to provide feedback and test the hooks when you will be working on them. I'm mostly interested in performance testing and my use case would be to run them on CI and compare to previous metrics stored somewhere in order to fail CI when performance regressions are introduced.

Happy to connect, got contact info in my profile.

5 hours ago by radimm

For now only fixtures support transaction as cleanup options, but that's a good point that tested queries might also modify the queries.

I will definitely reach out, just give me bit of time to mentally recover from the exposure and got some meet ups where I promised to deliver some presentations and they will consume a lot of my spare free time.

7 hours ago by mickeyp

IMO, you should not avoid triggers if it helps prevent invariants in your database. That is what they are especially good at preventing.

You can instruct postgres to raise exceptions using the same error code that constraints use: that way your clients do not need to know the difference.

5 hours ago by jumski

Good point! For the SQL functions I mentioned, I'm comfortable without triggers - all mutations go through functions (no direct table access), and only start_flow is user-fac ing.

That said, there ARE other places that would benefit from triggers (aggregate counts). I've avoided them because they're hot paths and I was worried about perf impact - relyi ng on pgTAP coverage instead.

Your defense-in-depth argument is solid though. I should revisit this and benchmark whether the safety is worth the perf cost. Something like RegreSQL would come in handy

6 hours ago by tinodb

Nice! However I would actually advocate for fixtures in application code. I’ve seen too much drift otherwise. And creating “scale” is also easy, just add a for loop :). No programming in yaml needed. As an added benefit you can use the same fixtures for your end to end tests!

So it would be nice if RegreSQL would support fixture hooks for those who like this route.

12 minutes ago by jitl

To get prod scale out of a for loop I’m gonna need a few hours of iterations :-(

6 hours ago by radimm

It's not unreasonable view - noted, will add to my list. Thank you!

3 hours ago by aranw

How well will this work with something like sqlc [0]? sqlc has some custom syntax around the sql files specific to the library

[0] https://sqlc.dev/

3 hours ago by radimm

For now the syntax is not fully compatible - but my goal is to add https://github.com/boringSQL/queries (library behind SQL files parsing) to better align on it. It's definitely on my radar

6 hours ago by StarlaAtNight

Wonder if the YAML fixtures drew inspiration from dbt’s unit tests: https://docs.getdbt.com/docs/build/unit-tests#unit-testing-a...

8 hours ago by WilcoKruijer

It's pretty terrible how poorly developers test their database queries. This looks like a great step in the right direction. I think how the ORM story in RegreSQL develops is crucial. The SQLAlchemy integration looks interesting, but at the same time super specific. There are a million ways to generate SQL statements and ORMs are just one of them. A question that comes to mind is how will you handle interactive transactions? I'd say most complexity in queries comes from the back-and-forth between database and server. Is that out of scope?

Would also be fun if you could support PGLite [0], that's what I've been using to write "unit" tests connected to a "real" database.

[0] https://pglite.dev/

3 minutes ago by sgarland

> It's pretty terrible how poorly developers test their database queries.

Yes. This becomes especially obvious when you rewrite ORM garbage for something complicated, and are told that they can’t accept it, because they’re not sure how to test it.

8 hours ago by jci

My goto for this lately has been ephemeralpg [0] and pgTAP [1]. It’s been pretty great

[0] https://github.com/eradman/ephemeralpg [1] https://github.com/theory/pgtap

8 hours ago by radimm

OP here - I do agree some of the problems that come with SQL/ORM queries are pretty horrendous and that's exactly where I would like RegreSQL going. For now I can't promise the particular direction, but comments like this are the reason why I pushed myself to release it and keep it beyond my own playground. Thank you!

7 hours ago by mrasong

Just found out about pglite, this library is insanely cool. You can even run Postgres right in the browser.

6 hours ago by andy_ppp

This looks great, any plans for MySQL support (or a similar project), the legacy system I'm working on could really do with this!

4 hours ago by radimm

I'm obviously biased. Adding MySQL support is not that difficult but maintenance is (and ultimately PostgreSQL is better way forward (half joking :))

With current feature set it's something I have already considered but still undecided.

an hour ago by esafak

Are there interfaces that we can use to implement support for other databases?

4 hours ago by ForHackernews

PgTAP is only mentioned offhandedly at the end of this article, but it's an excellent mature tool for unit-testing your database: https://pgtap.org/

4 hours ago by radimm

OP here - I'm going to follow up with the separate article on pgTap. But the goals of both tools is slightly different in my mind.

RegreSQL: Regression Testing for PostgreSQL Queries

Daily Digest