☄️ Introducing pub-sub-lite: Lighter (Method-based) pub/sub for Meteor

alawi · June 16, 2020, 8:02pm

Got it @npvn, thanks a lot for this awesome and well-thought package, can’t wait to try it.

alawi · June 16, 2020, 8:18pm

Sorry, just to confirm my understanding, this change event is broadcasted to the caller client only (not other clients) and will automatically merge the changes to the caller client mini-mongo, is the correct?

rjdavid · June 17, 2020, 5:32am

Am I correct that this is a good mechanism for offline PWA? Methods caching the results through minimongo?

alawi · June 17, 2020, 6:03am

If I not mistaken, I think it caches the returned results but I will not client cache when offline and auto-sync when online.

MongoDB has a new solution for this called realm, I think it is worth exploring.

However, I do think thing there is opportunity to extend methods to support this mechanism, I think it would be great package.

erixtekila · June 17, 2020, 11:08am

Hi,

Would you mind elaborate a little on this, please ?

Gapher has caching + denormalization bundled
grapher-react has a flag to turn reactivity on and off from the query

Only downside is minimongo. Which seems ti be flushed regularly, in order to lighten the client RAM I presume. Unfortunatly, grapher doesn’t propose to turn this behavior off, on a query basis.

npvn · June 17, 2020, 12:20pm

Yes, that’s correct. The goal of this feature (mutation update messages emitting) is to address an issue you may face after getting rid of a particular pub/sub: When adding/updating/removing documents via a Method, the client-side caller no longer knows what changes have happened on the server. Traditionally that knowledge can only come from 1) having those changes sent via a pub/sub or 2) writing manual logic. Enhanced Methods automate this so your mutation Method will continue to work as if it was backed by a pub/sub (but instead of all clients, only the Method caller will receive these updates).

The package’s caching layer only caches Method result data, with the goal of avoiding unnecessary repeated calls. For offline PWAs we will need a more comprehensive solution. MongoDB Realm looks promising - it’s great to see that the MongoDB ecosystem is going in the right direction. I’ll think about use cases for PWAs further. Thanks for your suggestions!

I think @csiszi meant the following problems, given the context of using grapher in a non-reactive setting:

The need to have a lot of refactoring, especially on the front-end.
The lack of Minimongo merging, which means the data can’t be used elsewhere.
The lack of knowledge about what have actually changed on the server-side (without pub/sub or manually sending the changes with custom logic).

csiszi · June 17, 2020, 12:40pm

Imagine you have a todo app and the todo list lives in a <TodoList> component which uses a reactive grapher query the get the items. Every item has a switch to mark it as done. The switch uses a regular meteor method to update the completed field of that todoItem. Because of the reactive query, the change is sent down to <TodoList> without any additional logic (i.e. the name of the todo item is now crossed out).

If you change the reactive query to a static query, the <TodoList> component works at first, but when an item is clicked and the update method is called, the static query in <TodoList> doesn’t re-run so the change is not reflected in the UI. You have to manually get the updated todo item and merge it into the array retrieved using the static query or run the entire static query again.

To be honest grapher doesn’t have another feature which would be awesome: to use the data in the minimongo without running the query on the server. Let’s say you have another component which only shows the name fields of the todos. It’s a sibling of <TodoList> and if <TodoList> has a ready reactive query, there’s no need to ask the server for the names, we already have them in minimongo.

erixtekila · June 17, 2020, 1:06pm

With grapher-react, one could jusy mark the request as non-reactive, with a boolean flag like so :

export default withQuery(
    props => {
        return getPostLists.clone();
    },
    { reactive: true /*|| false*/ },
)(PostList);

This is very handy and doon’t need any refactoring clientside.

Do you mean that with your package, serverside changes are synced cliendside, without being fetched on a websocket ? Is it constant HTTP polling ?
Seems to be worse in term of performance, ain’t it ?

erixtekila · June 17, 2020, 1:10pm

Sure, I’m really interested to hear how pub-sub-lite could heandle this differently.
Long polling ?

npvn · June 17, 2020, 3:03pm

erixtekila:

With grapher-react, one could jusy mark the request as non-reactive, with a boolean flag like so :
export default withQuery(
    props => {
        return getPostLists.clone();
    },
    { reactive: true /*|| false*/ },
)(PostList);
This is very handy and doon’t need any refactoring clientside.

Sorry I should have been more specific. I was mentioning about what will usually happen when you use pub/sub predominantly in your app and then decide to switch certain parts to using Methods: It will require a lot of refactoring, especially on client-side because of the difference in signature and behaviours between pub/sub and Method. The pub-sub-lite package solves this issue because the helpers it provides (Meteor.publishLite and Meteor.subscribeLite) simulate the signature and behaviours of their native counterparts (Meteor.publish and Meteor.subscribe), so your existing rendering logic can mostly remain intact. If your app uses grapher-react (as in the code snippet you provided) then there is no use for pub-sub-lite.

There is no polling involved. pub-sub-lite’s enhanced Methods keep track of changes made during a server-side Method invocation by using MongoDB Change Streams, and then send those changes in the form of DDP messages to the client-side Method caller, where the changes will be automatically merged into Minimongo. This will resolve the issue mentioned by @csiszi in his example scenario above.

erixtekila · June 17, 2020, 4:22pm

Thanks for the in-depth explanation.
I’ll dig in, in order to anderstand how the method caller could be called back without listening to a DDP stream set before…?!

npvn · June 17, 2020, 5:57pm

It works like this: When a DDP client connects to the Meteor server, that connection is represented by a Session instance. All sessions (representing all DDP clients) are stored in Meteor.server.sessions. Each session carries a unique id, and this id is also attached to the invocation context (the familiar this) of each Method call. So we can use this attached id to trace back to the client session who called the Method:

const allConnectedClientSessions = Meteor.server.sessions;

Meteor.methods({
  myMethod() {
    /* 
      Note: This code is for illustrative purposes only. When using
      pub-sub-lite's enhanced Methods you just write your mutations
      normally without the need to know any of these details.
    */

    // `this` is the invocation context mentioned above
    const clientSessionId = this.connection.id;
    const clientSession = allConnectedClientSessions.get(clientSessionId);

    // Each session carries a `send` method, allowing server to send
    // messages to that particular client
    clientSession.send({
      msg: 'changed',
      collection: 'books',
      id: 'NzrGsj9ooJnQwbDfZ',
      fields: { numberOfBooksSold: 99 },
    });
  }
});

The changed message above is structured according to the DDP specification. When receiving this standardised message, the client will automatically update Minimongo to reflect the change.

You can find the relevant code in pub-sub-lite here.

mikeTT · June 18, 2020, 4:28am

First, this sounds like a great performance boost for Meteor apps that scale. I’ve read through this thread and feel like it has terrific promise. This quote is the part of things where I’m unsure what the change would be using this vs true pub/sub.

Given a Meteor app that has a social component where many parties are connected clients seeing the same thing (like a project management app), I’m imagining a scenario of 10 connected clients where one of them triggers a change that propagates through pub/sub lite. If the attached id is traced back to the client session who called the Method, does that mean that the one client that triggered will see the change, while the other 9 that didn’t call the Method won’t see it?

Sorry if I’m missing things but just want to understand how this would work in an app with many connected clients all tuned into (essentially) one data source.

alawi · June 18, 2020, 4:32am

I think npvn answered this question here:

Only the caller mini-mongo is updated, not the other 9 clients.

However, perhaps the solution could be extended to update the other nine, then it’ll be another version of redis-oplog.

mikeTT · June 18, 2020, 4:38am

Good pointing that out. Thanks. I sort of missed that and it now makes more sense.

npvn · June 18, 2020, 5:08am

Thanks @alawi. I’ve thought about the possibility of sending mutation update messages to all clients possessing the affected document(s). However, that would require the server to maintain a copy of each client’s data, in order to determine which documents (and which document fields) should be sent to which clients. In fact this is the basis of how Meteor pub/sub works, and this approach significantly reduces Meteor pub/sub’s scaling potential (the more clients subscribed, the more client data snapshots the server needs to keep).

For pub-sub-lite, I have an idea that can partially solve the issue: If the client calling a Method is logged in, we can find all client sessions belonging to the same user and push update messages to all of them. The implementation for this will be very straightforward, but its usefulness will be obviously quite limited, as we don’t support 1) anonymous clients and 2) different user accounts possessing the same document(s).

I’ll invest time this weekend to study the codebase of the awesome redis-oplog package to see if there’re lessons and insights we can apply to pub-sub-lite.

alawi · June 18, 2020, 5:15am

Yeah the Meteor server MergeBox, but this is the reason why Meteor pub/sub requires higher RAM usage.

Why not broadcasting only the changes to all clients, and let each client manage their data, if the data is detected to be out of sync (by missing a broadcast message), then let the client fetch everything again from the server.

You can perhaps leverage meteor-streamer, I think rocket chat uses that technique to scale to thousands of sessions.

What would be the limitation of this approach?

kschingiz · June 18, 2020, 9:35am

@npvn Hi, thanks for your amazing package, I see that your package relies on Mongodb Change streams.
I have already worked with change streams and built 2 experimental packages with it:

publish change streams: https://github.com/kschingiz/meteor-publish-change-streams
publish aggregation based on change streams: https://github.com/kschingiz/publish-aggregations

These packages were never finished, because change streams were consuming lots of memory and opening +100 change streams would lead to serious performance issues.
What about your package? Have you tested it for at least +1000 connections? What are the memory consumption, db performance?
Would be happy to know that change streams are fast nowadays, because I can finally finish my packages above.
Thanks.

npvn · June 18, 2020, 7:27pm

Yes, by disregarding the need to send exactly the data clients need down to the fields level, the server should no longer have to keep client data snapshots. We may end up “wasting” bandwidth (by sending more than what is really needed by clients), but bandwidth is usually not a bottleneck factor (compared to server resources). It seems that @mitar has explored this path with control-mergebox. I’ll see if there’re things we can learn from that package.

The performance potential of meteor-streamer really impressed me, and is definitely something we can consider instead of using Meteor’s built-in DDP sender.

But the most important thing we’re missing here is a way to track which clients are “interested” in which document(s), so we know who to send update messages to when changes happen. I have an idea: Up until now we haven’t fully exploited Meteor.publishLite and Meteor.subscribeLite. They have been merely used as an API for converting existing pub/sub to Methods, and in fact my original intention for them was just to target legacy code. But I have a paradigm shift now: We can make them first-class citizens, and leverage the arguments passed to them to construct a registry of clients and their documents. This will be similar to the way Meteor pub/sub record this information, except that we aim for a looser data transfer mechanism and don’t keep client data snapshots.

Many thanks for your ideas and suggestions @alawi. They really helped point me to the right direction!

kschingiz:

@npvn Hi, thanks for your amazing package, I see that your package relies on Mongodb Change streams.
I have already worked with change streams and built 2 experimental packages with it:

publish change streams: https://github.com/kschingiz/meteor-publish-change-streams

publish aggregation based on change streams: GitHub - kschingiz/publish-aggregations: This package lets to publish aggregation with pipeline and options in MeteorJS

These packages were never finished, because change streams were consuming lots of memory and opening +100 change streams would lead to serious performance issues.
What about your package? Have you tested it for at least +1000 connections? What are the memory consumption, db performance?
Would be happy to know that change streams are fast nowadays, because I can finally finish my packages above.
Thanks.

@kschingiz Thanks for sharing your projects with me. I’ve taken a look and they’re really interesting experiments! Opening a large number of Change Streams indeed is a performance concern, because each stream will open a new connection to MongoDB. You will potentially face two kinds of bottleneck:

The number of streams is larger than the current MongoDB’s poolSize (the maximum number of connections a MongoDB client can make). When you exceed this limit, subsequent requests will need to wait and that significantly slows down response time. Unfortunately the default poolSize in MongoDB Node.js driver is only 5 (a value set for “legacy reasons”).
Even when you have enough poolSize, too many connections will put high load on both the Meteor server and the MongoDB server.

pub-sub-lite solves the first challenge by setting poolSize to 100 by default (a number inspired from the default value in the Python Mongodb driver), and allow package users to customize this value. Also, pub-sub-lite tries to avoid the second challenge by limiting the number of streams opened for each collection to at most 1. If a Method invocation needs a stream for a particular collection, it will check if an existing stream has already been opened for that collection (by previous Methods) before attempting to open a new stream. Streams are closed as soon as possible once they’re no longer used by any Method invocations.

This approach ultimately keeps the theoretical maximum number of connections equal to the number of collections. In practice this number will be even lower, because the number of collections being mutated at the same time is usually small.

You can find the relevant code in pub-sub-lite here.

alawi · June 18, 2020, 8:21pm

But keep in mind that this needs to work for a server clusters, rocket chat uses the DB to sync the event emitters, others use Redis or another 3rd party pub/sub.

My pleasure! and thanks again for the package, I’ll surely be using it when refactoring