What exactly does subscriptionHandle.stop() do on the server?

evolross · February 26, 2018, 11:06pm

I’m experimenting with large quantities of users (2500+) all connecting near simultaneously and needing the same data/query via a publication (e.g. think chatroom users getting a static chat history from the day before). I don’t want to use a Meteor Method because 2500+ shouldn’t all query Mongo for the same data.

So I want to leverage that fact that publications with an identical query observer will cache the result for additional clients in the future. I actually don’t even need reactivity, just a static data set (per my use case above). So I understand if I only implement the added function of a publication then I’m solely adding documents and thus not processing changed or removed. I was also thinking on the client I could call subscriptionHandle.stop() immediately after fetching the data (perhaps in the callback of the subscribe(...)). I would call subscriptionHandle.stop() just to further reduce server overhead after fetching my data. I’m banking on the idea of the server having the observer cached for the identical query that every client is doing (e.g. Chats.find(chatroomId, day)).

My question is, if every client immediately calls a stop() after fetching the data (because the data is static and “continued reactivity” is not needed nor will any new chats be added), will the server continue to have a cached observer/query for this publication and thus send the data from the cache and not hit Mongo?

I worry that if each client is immediately calling stop() and the server doesn’t cache a publication observer for a set length of time, then each client (or at least a lot them) will not find a cached observer on the server and thus hit Mongo again. When the “last” client calls stop() does the server then tear down the cached query/observer?

So I’m wondering what subscriptionHandle.stop() actually does on the server in regards to caching.

robfallows · February 27, 2018, 10:02am

Actually what you get is still a per-client cache (the “merge box”), even though the observer is re-used.

You may find this issue worth reading:

github.com/meteor/meteor

Optionally disable merge box in Meteor for publish functions

opened 10:08PM - 10 Nov 15 UTC

closed 07:10PM - 09 Jun 17 UTC

mitar

Type:Feature Project:DDP Severity:has-workaround

**_2 Upvotes**_ [This segment](https://youtu.be/783BL__zIjY?t=1056) is a great e…xample why sometimes it would be beneficial for some publish functions to disable merge box. So currently for publish functions there is a trade-off. Do you want to minimize data send over the wire (then you have to remember what each client has) or do you want to minimize data stored on the server (then you have to send the whole change every time and leave it to the client side to resolve the diff). I think it would be great if this would be officially supported in Meteor. I think it can be easily done without much changes needed. Simply every `added`, `changed`, `removed` would be send as-is directly to the client or clients. No diffing, no computation, no memory storage. I think this would better align with a broader Meteor ecosystem than current hack of using [meteor-streams](https://arunoda.github.io/meteor-streams/) (which are sending two DDP messages, `added` and `removed` to get merge box to forget the content). cc @rodrigok

In particular, @mitar’s control-mergebox package.

TBH, if all you want is a static snapshot of the data, I’d look at using REST or GraphQL, which will be more performant than DDP (whether via methods or pub/sub). In a multi-servo environment, I’d likely use a shared cache (Redis), so there’s only one in-memory copy of the data.

robfallows · February 27, 2018, 10:17am

I didn’t actually answer your question

All documents “belonging” to that client’s subscription on the server are removed (the merge box for that client has qualifying documents deleted).
The client is sent the unsub message over DDP, which tells it to remove its copy of the subscribed documents.
Tthe client runs its subscription’s onStop callback.

evolross · February 27, 2018, 8:23pm

I think you get more than this. According to @glasser in the comment of this article, you get:

Caching (which is called “query de-duping”) is handled at a higher level in the Livequery driver, so it works the same for both oplog and poll-and-diff. If a single server process is asked to observe a Mongo query that it is already observing, it “de-dups” and only runs one underlying poll-and-diff or oplog-watching process.

I tested this using poll-and-diff and it’s true. If Client-1 connects to a server and subscribes to a publication of five chat messages and it’s polling every fifteen seconds and let’s say seven seconds in three new chat messages are added, then two new clients connect (e.g. Client-2 and Client-3). All the clients will all only show the original five chat messages until the one cached observer (and thus data-set on the server) updates by polling again. It’s obviously not querying Mongo for each new client that connects because they would show the three new chat messages. So it’s more than a per-client cache.

The issue you linked deals with turning off mergebox functionality which is actually not what I want. I want to make use of the server caching which apparently comes along with the server mergebox. I want to use memory on the server versus querying Mongo again for the same exact data.

Calling subscriptionHandle.stop() also calls the publication’s onStop callback. Which I’m assuming does what your first bullet mentions:

I’m assuming the above is true because at first I tried to call the publication’s this.stop immediately after this.ready in the publication as an experiment and it never sends any data.

My original question and worry and what I want to verify, is if the above quoted text occurs… does this then remove any cached observer/data? If so, then calling the subscriptionHandle.stop() immediately after fetching it on the client would be bad, as it wouldn’t allow all the other clients to make use of the cached observer/data-set… right?

I guess the question is, from @glasser’s quote above, if a server is “already observing” a Mongo query it “de-dups” it. So does calling subscriptionHandle.stop() cause the server to stop observing if no other clients are continuing to observe the same query?

It seems like this functionality is implemented like a stack. The stack is client subscribe calls. As each client calls subscriptionHandle.stop() it gets removed from the stack and when there’s no more subscriptions in the stack, the server observer is destroyed.

evolross · February 28, 2018, 9:18pm

It looks like any “shared” data between clients subscribing to the same publication may be duplicated for each client on the server in memory because the server maintains a SessionCollectionView for each connection.

So 2500 connections X (shared data) = A big waste

https://medium.com/@omriklinger/i-think-it-actually-uses-500-times-the-amount-of-data-96702b5a5e53