Challenge of Large Collections | Meteor DDP | Pub Sub

crusifix · October 18, 2017, 6:09pm

USE CASE

We are currently in situation Q1) we need to serve + 50 000 documents ( 750 000 rows in pretty JSON ) in reactive mode for the client, Q2) store them locally for offline use in both cordova device and browser, because customer must be able to use application and it’s data in offiline, Q3) run and store methods in offline mode and invoke the methods and subscribes when the client-side reconnects with the server, even application or device restarted while offline.

QUESTIONS

Q1

a) Is it even possible to serve this many documents via Meteor DDP and Pub/Sub?
b) What is the max amount of the objects/data you have been able serve via Meteor Pub/Sub?
c) How much Minimongo can handle at client-side?
d) Is all subscribes and their collection data stored in server within client session? Does that mean that if client subscribes for 5mb of JSON that data is also stored within client session in server?
e) How much server resources is required to publish 1mb of JSON. Somebody tested?

We have tried to serve 20 000 documents (to give an idea 300 000 rows of pretty JSON or around 5,5mb) via Meteor Pub/Sub and it takes 2 minutes to finish the subscribe, after that it works very well, but unsubcribing the data takes also 2 minutes (maybe due to how pub/sub works), all this in Galaxy towards Mongo Atlas collection. We have also tried to fetch the same data via Meteor.call and it takes 2 seconds, but using Meteor.call would require some extra logic at client-side to maintain the reactivity.

Q2

a) What is the best way to store collections offline? Are we still talking about GroundDB? What version is recommended?
b) What is the max amount of objects/data you have able to store in GroundDB?
c) What is your way to store collections in the device and browser?

Q3

a) Looks like GroundDB has depreciated resuming methods in version 2 (which is required to have IndexedDB support) and looks like we need to build “resume methods” -package by ourself, or is there some package that we are not aware?

msavin · October 18, 2017, 7:58pm

I’m impressed that Meteor handled such a huge publication without problems.

I think you should be looking into using Meteor methods to retrieve data in batches, and then storing them locally. You could load the necessary data upfront, and then retrieve the rest in the background.

As you said, its extra work on the client, but it’ll give you the most control. I do not think pub/sub was intended to be used for such large collections, and if you ask anyone who scaled Meteor, the trick is to either have small subscriptions or subscribe by document _id.

The good news is, minimongo is pretty smart when it comes to updating documents. You can create a local collection and upsert data into it without worrying about duplication, etc.

captainn · October 18, 2017, 8:15pm

You will probably need to do some optimization to get that to work. A couple of notes to consider:

A reactive publication/subscription holds a copy of every record in memory on the server, for each connected client, leading to concerns of scalability. A method does not do this.
A publication/subscription delivers each record of a collection individually, which is probably why it takes so much longer to transfer so many documents compared with a method.

I’d probably avoid using built in mini-mongo reactivity, or use it creatively. For example, you could grab all your documents (how big is each document?) over a method, and have a sibling collection which contains only an id for reference, and a timestamp for the last update, and subscribe to that. This would make the collection smaller in overall size. Another thing you could do is batch them - maybe in groups of 100 or 1000 - so each record contains 100 references to the other documents. This would reduce the number of individual documents that must be sent to the client. When one of those references changes because the timestamp changes, it would update the one document, and on the client side you could then refetch the one that changed yourself. It’s a bit of work, but it’s not too bad.

A solid offline publishing tool would be great. The problem is in reconciliation - how to make sure that multiple clients doing an edit can be merged in an appropriate way. I’ve even thought of using a git backend for certain types of documents. I haven’t implemented anything though.

vigorwebsolutions · October 18, 2017, 8:28pm

Regarding the initial pub/sub, have you given any thought to (or tried) redis oplog?

captainn · October 18, 2017, 9:30pm

Is there any better writeup of what that does than the readme?

vigorwebsolutions · October 18, 2017, 9:58pm

There is a short write up or also you can comb through this thread or notably this post.

crusifix · October 19, 2017, 5:47am

REDIS OPLOG - SHORT RECAP

Regarding to the cultofcoders:redis-oplog documentation

However, changes that happened while Redis was down will not be visible. In future we will treat this scenario.

REDIS Q1) Looks like a problem, but maybe the whole data set can be fetched again on reconnect, REDIS Q2) but I also wonder how this works if I having an admin panel from where I am connecting to the several applications via DDP and I want to utilize remote publications and methods.

Regarding to the diaconutheodor @ Meteor Scaling - Redis Oplog

RedisOplog will eventually die, being replaced by ChangeNotifications

because of MongoDB 3.6 will have a new change notification API, called a “change stream. (see Use MongoDB change notifications instead of oplog with MongoDB 3.6), but before thinking of change stream 32-bit support must be dropped Dropping Meteor 32-bit support

Regarding to the diaconutheodor @ Meteor Scaling - Redis Oplog

Why RedisOplog and ChangeStreams are two different beasts ?
With RedisOplog, changes are sent out to Redis. With ChangeStreams, there’s no need to send changes anywhere. This is why the two, can’t be properly merged together.

RedisOplog Advantages

Redis is a beast in what it does, it’s very performant, with ChangeStreams, the CPU of the database will increase by a lot. But the DB can scale, so this is not a decisive factor.

Synthetic Mutations (emulate changes that aren’t saved in the database)

“Silent” Mutations (that do not trigger reactivity)

ChangeStreams Advantages

Reactivity at the database level means one less point of failure

Solves the problem of FLS queries

Easy to implement, given the changes processor is handled at Mongo level. Less prone to errors

Faster reactivity since you don’t need an additional system to talk changes.

Conclusion
RedisOplog is a temporary solution until ChangeStreams comes to the mongodb node driver. The story behind it was > > noble, I’ve invested a lot of time in it, we managed to solve an important issue of Meteor, but the sad(or happy) reality is > that once Notifications API comes to life, that is it, that is the way to go. It’s the same thing they did with Blaze, it is a > > beautiful engine, it solves elegantly lots of problems, but React won, I am agnostic.

thoomasbro · March 2, 2018, 3:37pm

Very interesting Topic. Any update on how you solved your problem ?

maxtws · November 4, 2018, 6:06pm

@crusifix, did you find any solutions? How did you solve these problems?

doctorpangloss · November 7, 2018, 12:25am

Based on that requirement, they didn’t use meteor Nonetheless, if you wanted to achieve everything else without the disconnected execution requirement, they probably just stopped sending so much data, considering a human being can barely process 100 rows of tabular data at a time.

They were almost certainly using this to visualize a time series or map data in some kind of dashboard. Aggregation already reduces the data massively, and they probably just let the client update its “query.”