I have always wondered why Meteor resolved to use oplog to monitor mutations to provide for subscriptions, when if programmed more generically on the webserver tier could more easily be expanded to other databases. Was it absolutely necessary to use oplog, or is it simply more performant?
I see that the Postgres implementation uses triggers and stored procedures (again, on the database tier)–it appears some queries might even be impossible or simply too costly to maintain enough state on the webserver tier in order to properly capture new subscription data from mutations.
Or perhaps, the challenge was that when dealing with webservers in a cluster, each server would have to propagate all updates to subscribed queries to all other webservers. Or perhaps something more generally related to keeping a bunch of webservers in sync, after all, each would have to store the definition of each subscription in order to filter mutations.
In general, I feel like not enough people truly understand the ramifications and challenges at stake with this endeavor. And I think it’s more than ever an important time to understand it.
HERE’S WHY:
GraphQL/Relay doesn’t out the gate provide subscriptions. Coming from a Meteor perspective–if you are like me–at first I just assumed all queries were automatically kept up to date. That’s far form the case. GraphQL and Relay don’t currently even support subscriptions. When they eventually do, it will be up to you on a query by query basis to optimize their GraphQL types and resolutions to be automatically kept up to date with the database. That’s a lot of work, and for many of us we won’t know where to start.
Reading between the lines, what becomes clear is this: Facebook found it impossible to scale an entire database generically! So, instead, they likely only keep a few things up to date, and probably use unique means to do so for each query/feature! This is the opposite of Meteor’s current behavior, where the entire database can generically be subscribed to!!
So with a future where we will likely have to build our own GraphQL types and resolutions on the server, and/or where Meteor will provide abstractions to make this easier, I think it’s very important we understand how we would go about this.
So, is it possible solely on the webserver tier to monitor queries and their datasets, and determine when new data enters these subscribed datasets? Or, do we always need the entire state of the database, and therefore need to maintain such descriptions on the database tier?? Or perhaps it’s just some queries that need access to the entire state.
How much of this can we build generically on the webserver tier irrespective of the database actually used? That way we can benefit from lots of code reuse between database implementations.
Here’s some example queries to get your minds rolling:
Persons.find({ }, {sort: {updatedAt: -1}, limit: 10})
Persons.find({ age: { $gte: 21 }}, {sort: {updatedAt: -1}, limit: 10})
Persons.find({ $or: [ { age: { $gte: 21 } }, { sex: 'f' } ] } {sort: {updatedAt: -1}, limit: 10})
It seems to me that in these examples when a subscriptions is made for one of these queries, you perform the query against the database once and keep the 10 rows in webserver memory, and then filter out any future inserts/updates that result in rows that site nicely within that 10 row dataset.
What challenges do we have to accomplish this?
The biggest one I’m currently seeing is that updates don’t pass along the entire state of the mutated row through the webserver; what is passed along is only, for example, that you want increment the age to 22 for a given ID. That’s all the webserver sees, not the values for all fields for the given row. That means the webserver has to perform a database read for every mutation to truly see if the mutated row matches a subscribed query.
Perhaps the above example is automatically solved if you assume that users on the client can only update rows already in a subscription’s dataset. That’s often true, but it’s not true when you make batch updates directly from the server (i.e. as within a Meteor Method
). In that case there are no rows already subscribed to. This right here seems to be the main reason why a database-specific oplog approach was taken–it’s a lot more performant to execute all this additional querying on mutations within the database; you just have to send information regarding your subscription queries to the database and have it stored there in order to facilitate this decision-making. Same goes for stored procedures and triggers as used with Postgres.
Lastly, I think if we can collectively get an idea of how to implement this stuff, we can come up with a simplified interface on top of GraphQL that application developers can use.
Basically what I imagine is the ability to “drop down” and use GraphQL for cases when you want to customize for performance and the like, but can begin by prototyping your app via a far simplified interface that probably looks identical to what you have now. In addition, there is a 3rd option in the middle perhaps, i.e. a simplified GraphQL that isn’t fully automatic like what we have now–perhaps this is just specifying a schema similar to SimpleSchema and what you u use in the Mongoose Mongo ORM that’s popular outside of the Meteor world.
As far as subscriptions go, there should be a straightforward mechanism/switch to turn on certain queries to be subscription based. Perhaps the current mongo LiveQuery approach can optionally still be used behind the scenes (in conjunction with GraphQL), but you can turn it off and manually build more scalable subscriptions for the individual features/queries that really need it.
And as far Relay or whatever Relay-inspired API we decide to give React, we provide a more opinionated and consequently simplified interface. For the most part, the API should be simply a query (or set of queries) paired to a React component. But we can get to this later, once we have the pubsub stuff down.