I’ve been reviewing publication strategies in order to do some app optimization. I think I have a common use-case that could prompt a new publication strategy that would be highly optimal. I’m curious if anyone from Meteor (@nachocodoner) could sound off on how hard this would be.
The use-case is a game with a high-number of real-time simultaneous clients, say 10,000, who all subscribe to the same game document. The publication returns 70 various fields that can all change in real-time but not all at once (e.g. status, playersAlive, phase, powerUpsLeft, etc.). All 10,000 players use the same publication observer and, at least for the game subscription, receive the exact same client data and updates. There’s nothing unique, per player, about the game document, nor is it subscribed to more than once by any other subscription, nor is it ever removed. We just need to follow real-time diffs on the fields.
It seems all the current publication strategies do not optimize for this seemingly common use-case:
SERVER_MERGE: This is overkill as all clients have the same game document, that is only subscribed to once, not used in other subscriptions, and not removed. So the server doesn’t need to keep a copy of every client’s game document. Yet, this is the only publication strategy that sends field diffs down, which is what we want.
NO_MERGE_NO_HISTORY: This sounds good in that it only send updates and doesn’t store each client’s data, but it sends the entire document each time, so 70 fields, when we just need single field updates.
NO_MERGE: This is like NO_MERGE_NO_HISTORY but includes removed support, which we don’t need. Still sends all 70 fields.
NO_MERGE_MULTI: This tracks for multiple publications, which we don’t need, and still sends the entire document, so all 70 fields.
Would it be possible to add something like a SHARED_MERGE or SINGLE_MERGE that for a given publication observer, it keeps one copy of the document that has been sent, diffs any changes, and then shares that with all clients of the same publication observer?
It seems something like this would deliver the pub/sub magic of Meteor but greatly reduce the memory overhead of a scale-level of connections, albeit with the right use-case/circumstance (that seems fairly common to Meteor’s use-case).
Would just using NO_MERGE_NO_HISTORY still be way more optimal than SERVER_MERGE even with sending 70 fields to every client?
If it’s applicable, you can try splitting it into two collections: one with immutable fields (I guess you have some) which will use NO_MERGE_NO_HISTORY and the mutable ones that’ll stay on the default SERVER_MERGE. Alternatively, you could fetch the immutable fields using a method and only publish the ones that can change.
No one will be able to tell you more than it depends, so just measure it and see for yourself There’s a lot of moving parts here, including WebSocket compression (which would reduce the impact of the lack of diffs) and publication multiplexer (which would result in 1+N copies instead of 2N copies of the published document as well as just 1 database observer).
Indeed, I just realized upon reviewing the docs that a publication strategy applies to the entire collection, not just a given publication. That changes things. Unless you did some kind of splitting like you mention, this would normally be problematic using anything besides SERVER_MERGE but in my use-case, I actually break my app into micro-service Meteor apps, so I could turn on a publication strategy safely for the entire game collection just on the player client’s Meteor app micro-service. If I had one big giant app (e.g. game admin, player client, etc.) I wouldn’t be able to use anything but SERVER_MERGE, unless I did some creative collection splitting.
Interestingly, I just did some testing of what DDP sends to the browser client with the various publication strategies and on NO_MERGE, NO_MERGE_NO_HISTORY, and NO_MERGE_MULTI, when editing a single field of a game document from one login, I am seeing only that single field come down to a second subscribed login from DDP in the websocket log in a changed event. From the docs (and Claude), I got the impression every change would send every field from the publication. Doesn’t seem like it in my tests, unless I’m doing something wrong.
Is the MERGE in all these publication strategies referring to the merge between multiple publications and shared fields changing? If so, then NO_MERGE_NO_HISTORY might be fine as it seems to be only sending down the updated fields.
Overall, there won’t be that much of a difference if you only have a single publication – both the collection and document views (so called “mergebox”) are mostly relevant when merging multiple publications (or whatever you send using the low-level API).
When you change a single field in the database, the oplog entry will contain just this one field. That will go through the watcher and result in a single field changed, that’s correct. But if for any reason, there will be the same value in a subsequent oplog entry (e.g., in a replace operation), then SERVER_MERGE will not send anything while the others may.
Not storing a 70 field published document in memory for all 10,000 clients (in actuality it’s more like 2000 per container)
DDP not sending a full 70 field document every time a single field changes.
It sounds like these are both true in NO_MERGE_NO_HISTORY. When would DDP actually send the full document besides on added? If changed always just sends the changed field(s) then this should work for me.
There’s a replaceOne MongoDB operation, for example (it’s common if you’re using Compass to edit documents). Certain aggregations can do that as well (e.g., $out; I’m not sure if that hasn’t changed into a drop + stream of inserts).
This is a really interesting proposal, @evolross. SHARED_MERGE would be a great addition to Meteor’s publication strategies, I see it’s being aplicable to dashboards where many clients subscribe to simultaneously, all receiving the exact same data, dashboard is a common usecase for any app.
@radekmie’s suggestion of splitting into two collections is clever and could work as a short-term workaround, but honestly it feels like a hack archtecture, useful to apply right now, but not something I’d recommend as a long-term practice. It adds unnecessary data modeling complexity just to work around a limitation in the publication layer, and that kind of workaround tends to become technical debt over time.
The current team workload is pretty full right now, but this is genuinely interesting work and it aligns well with the moment that we are touching the reactivity core system at release 3.5
If you’re up for it, we’d welcome a PR this seems like something worth exploring further!
MERGE is one of unique features of meteor.js, however in my opinion it’s very niche as the main advantage is that it saves bandwidth at the significant memory and performance cost. SHARED_MERGE could be really interesting in these “shared by everyone state” scenarios like you mentioned with dashboards or games and where mergebox could actually shine. It could be a good selling point against other frameworks.