How would you handle a publication with a lot of data meaning hundreds of thousands of docs with several fields on them being push back to front end? I would like to avoid pagination as Im using a sort to show the most recent docs, On the web app Im working on we found out there was a lot of memory leaks over time when the publication starts adding a lot of data meaning that our code isn’t properly optimized to handle this data bursts we have.
What would be a feasible way to address this type of issues?
Given that you’re looking at hundreds of thousands of docs, the first question I have is do you really need to push all that data to the client? It seems that you’re displaying a very long list, no user will be able to consume all that. You’d run into cognitive limitations in addition to the technical ones. If you need to create an illusion for the user that it’s viewing the whole database, then perhaps a virtual scroller with dynamic data loading would be an option. Only the data that’s currently in the user’s ‘viewing range’ would be published then. Most of the sorting would then be made server side inside the publication itself.
If for some reason you do need to push all that data to the client, then do you absolutely need to have the out-of-the-box reactivity of publications and cannot use a method for this? For such scale, I think you’d run into problems with publications no matter what you do. An alternative would be to load the data with a method and fake reactivity with reasonably spaced out polling manually (i.e. a method fetches new data with a reasonable interval from the server and then you’d manually reconcile the old and new data client side without relying on the minimongo magic).
Sorry, need to clarify a bit more. We are talking about a couple hundred cards per user (which adds up to the hundreds of thousands of docs that @mexin was writing about). Actually, it’s mainly one collection (Cards) which is displaying important information in the form of cards to the user. One clicking on these cards some specific actions are being triggered on the backend.
We do have 3 different pubs for this collection, due to the fact that there are different types of cards. The first two are restricted to 100 and respective 200 cards, the last one is unrestricted as there is a maximum of maybe 5-6 different ones that are displayed there.
The problem is that when we communicate with one of our partner sites via API we are creating a lot of cards, several per second. This goes on for about 3-4 minutes in total until all data is read and consumed and no more new cards are produced. So the problem is only happening on the initial load.
It seems that the observer has problems as the number of cards is constantly at that moment changing, handling the 3 queries (for each pub) and the conditions within (including the limits).
A simple solution would be throttle the reactivity of pub/sub and only update new Cards content every let’s say 250 or 500ms. But I have no idea how to do that.
The main problem it seems is that the GC has trouble with cleaning up behind and big memory leaks (from 1 to over 8 Mb’s) are created. This will lead eventually to a restart of our Galaxy backend server (we do have a different server for front- and backend).
Here are also two pictures that illustrate the memory leak problem.
First one shows the GC managing it’s cleanup until 1.5 minutes into the data loading:
Right, now it makes sense to me. Great explanation.
Unfortunately, cannot help you with throttling publications. Have not needed to do this myself. Should such need arise, I’d likely start by looking into managing the publication manually via this.added, this.ready, this.changed and this.removed instead of just returning a cursor from it. Might be of help, perhaps. But not sure. The gist is here: https://guide.meteor.com/data-loading.html#custom-publication
Alternatively, if throttling doesn’t pan out or is too complex, why not solve the problem with…CSS! The whole issue here seems to be the 3-4 minutes that data is populated from outside sources. During this time there are a lot of changes to the data so probably whatever the user sees on the page in those minutes will be essentially branded as ‘incomplete’ or ‘subject to change’ anyway. So perhaps instead of displaying incomplete/jumping data to the user you could just display a loading icon and not fire up a subscription immediately (or subscribe to only a small subset of the data to keep the user’s eyes busy while loading most data in the background). Once the loading is done, trigger the subscription. I’d assume that if there’s no active subscription (or only a minimal one) during the loading process then the GC issues should also disappear, regardless of the 3rd party sources hammering the DB.
In UX land 3-4 minutes is an eternity, granted. But I’d say good visuals go a long way. For example, while the loading is in progress and loading icon spinning, perhaps fire off a method every 10 seconds to the server to query about the number of new cards received during the loading. Showing a dynamically changing text a la ‘240 cards received’ under the loading icon should help relieve the waiting anxiety. Of course, assuming this is a business app and the user will then inspect the loaded cards for a good hour or two after those couple of minutes. In social media land there is no way you’d survive a 3-4 minute loading time