Reactive queries for PostgreSQL

Over the new year’s break I implemented one idea I had for reactive queries for PostgreSQL. You can take an arbitrary SELECT query, using multiple joins, data transformations, and even custom functions, and besides the initial set of results also get real-time updates about any changes to those results. Using PostgreSQL’s to_json and jsonb data type and you can literally simulate MongoDB now with all extra features of transactions and joins. And with this package you can now easily make a Meteor publication to push data to the client, reactively.

Any feedback welcome. :slight_smile:

18 Likes

Very cool! Unfortunately I doubt I’ll have time to play with this anytime soon but it is awesome to see new tools and techniques for live data. Thanks for making this!

Nice package @mitar …reminds me of https://hasura.io/

Interesting. Do you know how Hasura does reactive queries? I could not find relevant code or documentation about this. All is just about client side and how to use it.

I think they are using graphql subscriptions for reactivity. I heard about it on this react native radio episode.

Yes, this is how you get it from server-side to client-side. But how does it work on the query side? How does it know what has changed in PostgreSQL database?

It looks like it has to do something with this and event triggers.

Event triggers and DDL is about changes to tables schema/structure, not data. To my knowledge.

@mitar this is simply amazing! We use Postgres heavily in our work, and have long been looking for something like this. I have a couple of questions - didn’t want to pollute the repo’s issues list so asking here.

  1. “Queries require an unique column which serves to identify rows and changes to them…”. Not a big issue, as having a unique key is good practice anyway, but out of curiosity, why not use the row OID?

  2. How would this work in the context of pooling. Say, transaction level pooling, like the one possible with pgBouncer. I’m guessing it would be impossible and one should fallback to session pooling. But actually, is any form of pooling really possible when using this package?

Thanks!

True, I was confused by the terminology hasura uses but they do seem to be using regular triggers to capture crud. From then on, they seem to be using their metadata to match them to subscriptions. This is seemingly trivial for hasura because it looks like what they essentially do is parse and convert any complex graphql query to a single sql query, and having done that, generate triggers to “listen” to events. They use this for calling web hooks too.

User-exposed OIDs are being removed in PostgreSQL 11, BTW.

And unique index has to be able to be made on results, not source tables. So if you join two tables, do joined results still have row OIDs? Also, what happens if you reorder the results. Do row OIDs follow? Or are they more like index in the array?

Anyway, you can specify as index column any column. Probably picking “oid column” could work as well.

How would this work in the context of pooling. Say, transaction level pooling, like the one possible with pgBouncer. I’m guessing it would be impossible and one should fallback to session pooling. But actually, is any form of pooling really possible when using this package?

This package does its own pooling internally. You can control how many connections you want for this package to do and then it multiplexes things over those. So it users temporary objects internally, not sure how pgBouncer deals with that. So how it assures that same client can access its temporary objects between queries.

1 Like

So I would like to see how they generate those triggers. Do you know where this is? Because you cannot have triggers on queries. So this is the main tricky question how to do (and what my package tries to solve). How do you know when and how did query results change. So I am curious how do they do that.

I’ve never used PostgresQL (from a developer’s perspective). I wonder how this differs from numtel:pg, which uses triggers to get reactive PG…?

I don’t know any haskell but I tried to follow the code which kind of led me to believe that they are basically requerying per each crud event that is relevant to that query, not necessarily tracking a cursor for the query itself.

numtel:pg uses pg-live-select and was a great inspiration for my package. You can see my biased comparison with other projects. In short, I believe my package is fastest (lowest latency) and has what I believe a better design (does diffing in the database, close to data).

But all those packages still do full refresh (similar to Meteor’s polling strategy). We would need an oplog-like approach as well. I described ideas how to achieve that here.

1 Like

Care to link to code?

So are they requerying per every CRUD event on any of sources? This is similar to what I am doing as well. But in my package I do all that work in the database.

The question is if they are doing anything smarter. For example, they could know in a smart way which changes to source tables does influence a particular query results and trigger event only then.

Ah, I did not bookmark it at the time. But if it helps, graphql limits root of a subscription to be single source (think of it as a single table), this might have made their job easier as it already hints at root table needing to be requiried.

What, GraphQL does not allow joins? I think it does. Or are you saying that Hasura implementation of GraphQL does not allow joins?

Yes, doing queries directly on base table (and not a view) can simplify things. But then we already have that and it is called MongoDB and oplog.

No what I mean is that the subscription’s root needs to come from a single source, which is by design a graphql spec. Think of it like Meteor’s pub/sub allowing multiple unrelated cursors be returned, graphql cannot do that.

I would not compare oplog to this, though. Perhaps change streams can be considered more analogous?

OK, but to that root you can join other sources, no? So if any data in those other sources change, you want that to propagate to you, no?

I would not compare oplog to this, though. Perhaps change streams can be considered more analogous?

Change streams and oplog are more or less the same thing. What I am saying is that if you have a query on one base table only, then it is easier to observe changes to that table in PostgreSQL using triggers. But if you have multiple tables, then you have to know how that influences query results through relation algebra. Which is trickier.