Large subscription - JavaScript heap out of memory

Am using meteor 2.3.2 and starting for the first time with some large collections and got “JavaScript heap out of memory” in the server side (in both development and production test).

To identify the problem, I created a test case with only 1 collection (2 million docs) and 1 subscription. In the test I could see that:

  • Storing 2mil docs without subscription: no problem at all
  • 1 subscription up to 10k docs: the subscription got ready
  • Larger subscriptions did not become ready (up to 10 minutes of waiting in the client side)
  • a subscription of 50k: server crashed (as mentioned above) at the point where process.memoryUsage(). heapUsed was around 4G.

Did I do something wrong or the limits above are realistic for mongoDB / DDP?
My test code is below:

Server side

const TEST_NUM_OF_DOCS = 2000000;
const TEST_NUM_OF_PARTITIONS = 200;
LC = new Mongo.Collection('lc');
if ( Meteor.server ){
    LC.remove({}, function(){
        try {
            LC.rawCollection().createIndex({
                id: 1,
                partition: 1,
                timestamp: 1,
                message: 1,
            }, { unique: true });
        } catch (err) {
            console.log("LC createIndex err:", err);
        }
    });

    for ( let i=0; i<TEST_NUM_OF_DOCS; i++ ) {
        const item = {
            id: i,
            partition: i%TEST_NUM_OF_PARTITIONS,
            timestamp: Date.now(),
            message: "Test large collection"
        }
        LC.insert(item);
    }
    
    const publicationName = "lc10k";
    if ( !Meteor.server.publish_handlers.hasOwnProperty(publicationName) ) {
        console.log("Publishing", publicationName, "...");
        Meteor.publish(publicationName, function(){
            return LC.find({partition:1});
        });
    } else {
        console.log("Already published!!!");
    }
}

Client side

console.log(new Date(), "start subscribing to LC (partially)")
Meteor.subscribe("lc10k", {
       onReady: function(){
            console.log(new Date(), "LC is ready, got", LC.find().count(), "documents");
       }
});
1 Like

What I forgot to mention… it struck me that the memory usage for the subscriptions are too large. The mentioned 2mil docs collection uses around 650M of disk while 4G of RAM is not enough for a 50k subscription. I must have done something wrong but I could not figure out what. Any hints?

1 Like

Using pub/sub to load a masive number of records has never been a good idea.
Pub/sub is powerful, easy to use and reactive but it’s not a silve bullet, there’s always trade off. You should consider to use meteor method to load data and use some kind of pagination.

3 Likes

@minhna Thanks for your advice.

Now I got aware of the memory requirement for pub / sub and the alternative.

But I still wonder why subscribing to a large number of documents require that much memory. I believe such a subscription may lead to:

  • cpu consumption in the server side to monitor the changes in the collection
  • some traffic on DDP when there are changes that need to be delivered to the client side

But why some occupied memory in the server side???

Could somebody with in-depth knowledge of DDP give me an idea?

1 Like

The server needs to know which documents each client has. Depending on the oplog you use and exactly how you publish the docs, there could be upto 3 full copies of each document for the first use of a subscription, then 1 or 2 for each additional subscription call to the same publication with the same arguments.

Take a look at the documentation surrounding the mergebox as some of this behaviour is now tweakable such that you only store the IDs not the full document.

2 Likes