Large subscription - JavaScript heap out of memory

brownmonkey · January 26, 2022, 2:21pm

Am using meteor 2.3.2 and starting for the first time with some large collections and got “JavaScript heap out of memory” in the server side (in both development and production test).

To identify the problem, I created a test case with only 1 collection (2 million docs) and 1 subscription. In the test I could see that:

Storing 2mil docs without subscription: no problem at all
1 subscription up to 10k docs: the subscription got ready
Larger subscriptions did not become ready (up to 10 minutes of waiting in the client side)
a subscription of 50k: server crashed (as mentioned above) at the point where process.memoryUsage(). heapUsed was around 4G.

Did I do something wrong or the limits above are realistic for mongoDB / DDP?
My test code is below:

Server side

const TEST_NUM_OF_DOCS = 2000000;
const TEST_NUM_OF_PARTITIONS = 200;
LC = new Mongo.Collection('lc');
if ( Meteor.server ){
    LC.remove({}, function(){
        try {
            LC.rawCollection().createIndex({
                id: 1,
                partition: 1,
                timestamp: 1,
                message: 1,
            }, { unique: true });
        } catch (err) {
            console.log("LC createIndex err:", err);
        }
    });

    for ( let i=0; i<TEST_NUM_OF_DOCS; i++ ) {
        const item = {
            id: i,
            partition: i%TEST_NUM_OF_PARTITIONS,
            timestamp: Date.now(),
            message: "Test large collection"
        }
        LC.insert(item);
    }
    
    const publicationName = "lc10k";
    if ( !Meteor.server.publish_handlers.hasOwnProperty(publicationName) ) {
        console.log("Publishing", publicationName, "...");
        Meteor.publish(publicationName, function(){
            return LC.find({partition:1});
        });
    } else {
        console.log("Already published!!!");
    }
}

Client side

console.log(new Date(), "start subscribing to LC (partially)")
Meteor.subscribe("lc10k", {
       onReady: function(){
            console.log(new Date(), "LC is ready, got", LC.find().count(), "documents");
       }
});

brownmonkey · January 27, 2022, 3:58pm

What I forgot to mention… it struck me that the memory usage for the subscriptions are too large. The mentioned 2mil docs collection uses around 650M of disk while 4G of RAM is not enough for a 50k subscription. I must have done something wrong but I could not figure out what. Any hints?

minhna · January 28, 2022, 3:16am

Using pub/sub to load a masive number of records has never been a good idea.
Pub/sub is powerful, easy to use and reactive but it’s not a silve bullet, there’s always trade off. You should consider to use meteor method to load data and use some kind of pagination.

brownmonkey · February 1, 2022, 12:12pm

@minhna Thanks for your advice.

Now I got aware of the memory requirement for pub / sub and the alternative.

But I still wonder why subscribing to a large number of documents require that much memory. I believe such a subscription may lead to:

cpu consumption in the server side to monitor the changes in the collection
some traffic on DDP when there are changes that need to be delivered to the client side

But why some occupied memory in the server side???

Could somebody with in-depth knowledge of DDP give me an idea?

znewsham · February 1, 2022, 1:28pm

The server needs to know which documents each client has. Depending on the oplog you use and exactly how you publish the docs, there could be upto 3 full copies of each document for the first use of a subscription, then 1 or 2 for each additional subscription call to the same publication with the same arguments.

Take a look at the documentation surrounding the mergebox as some of this behaviour is now tweakable such that you only store the IDs not the full document.

brownmonkey · July 1, 2022, 8:32am

Thanks for the info. Up to this point I believe that the state my app is in (JS heap out of memory) is kind of expected. That is, publications of 10k docs are not for my 8-core 3.5GHz 125G server. And digging into the oplog thing is kind of telling me to leave meteor/mongo

Then I thought again about my usecase:
1 - deliver data from server to client
2 - rendering using reactive data source in client
3 - no observe change is needed because the data in server side isn’t changed and no update of data from client

That is, my usecase isn’t a typical usecase of publish/subcribe because data isn’t changed at all. I just happened to use mongo collection because of (1) and (2).

Is there an option to tell the publish/subscribe framework to not observe changes to remove all the overhead that come with it?

rjdavid · July 1, 2022, 9:31am

znewsham · July 1, 2022, 1:20pm

As @rjdavid says - you can now specify different merge strategies for exactly this reason - you can also get results with methods rather than pub/sub directly, finally there is GitHub - adtribute/pub-sub-lite: Lighter (Method-based) pub/sub for Meteor - I’ve not used it myself, but as I understand it it uses methods behind the scenes but merges the data to minimongo so it feels like a subscription.

Bare in mind - depending on the size of the objects you’re sending, a 10k object websocket payload has it’s own problems (blocking CPU during gzip being one of them)

hschmaiske · July 4, 2022, 1:29pm

I agree. Pub/Sub for a large amount of data is not recommended because it can cause slowness, I would choose a method with pagination for that.