Subscribe on many documents vs large documents performance issue?

I have run into some trouble with my meteor application.

Previously I had a few documents containing alot of data. In order to add some flexibility I have changed the data structure so that much of the data in the previous large documents have been put into a separate collection now containing many documents.

However, after doing this change my subscription cant even complete loading! The CPU of the server goes up to 100% and then the DDP connection resets and the subscription starts over again.

In order to verify that this was indeed a problem I created a test app. I then created two collections, one called LargeDocs and one called ManyDocs. The total size of both these collections should be approx the same. In ManyDocs I create 40 000 documents with some random data. And in LargeDocs I only create 40 docs but with size 1000 greater per document than ManyDocs.

I then compare then compare the subscription time for the two versions. I have also tested with different versions of Meteor and this is the results:

Total size large docs  =  10197857 chars
Total size many docs =  10240335 chars

**1.6.1 - LargeDocs [ms]**

**1.6.1 - ManyDocs  [ms]**

**1.8.1 - LargeDocs [ms]**
2737 ms

**1.8.1 - ManyDocs [ms]**
Fails to subscribe. The DDP connection resets and the subscription is restarted

So for meteor 1.6.1 the total subscription time is 4 times larger when moving data into another collection with many documents.

I know there is a performance issue with meteor 1.8.1 so that is nothing I want to address in this question. I just want to understand why there is a difference when subscribing to many documents vs large documents when the total data still is the same? Any ideas?

1 Like

There are a couple of potential options here.

The most likely culprit is that the server has to track every document that every session has, and which fields of the document it has. This takes significant resources.

The second option is that 40000 small documents can be substantially larger than 40 documents, 1000x larger, depending on what is being stored. (e.g., just an id).

It’s a fairly unusual pattern to need to send 40000 documents to a single client. If you need the flexibility of tracking the individual documents on the server, you could try aggregating these on the server and sending fewer larger documents to your client

@znewsham : No in this comparison the total size of the two collections ManyDocs and LargeDocs is excatly the same. I have checked the total data amount sent to the client to verify that both cases have exactly the same amount of data. So that is not the reason for this.

Why does it take significant more resources to track 40000 documents? With OPLOG tailing meteor should only have to do work when a changed is identified on any document and for this test I dont make any changes to the collections. And even if it did, the update should be done in O(1) I presume.

It’s not about the changes, it’s about the work the server needs to do, to know if it needs to care about the changes. 40000 entries into a map, instead of 40 takes significantly longer, not to mention the work done to check if the client already has the document.

Also, regarding oplog tailing, the server has to care a little about every change. Regardless of whether a client has the document being modified, because for every change it only knows if it cares by checking whether one of its clients has the document. This takes a finite amount of time per change. There is also a cost to de-serializing the change notifications to begin with. This is why redis-oplog is a thing. I don’t think it’s relevant in this case, because you aren’t modifying the documents. But it’s worth noting

@znewsham I think I have identified the bottleneck here.

From what I can see in ddp-server/livedata_server.js it seems that the server is sending a DDP message for every single document! So it is in fact network traffic which is the problem. If it would be possible to batch all messages into 1 big message then the subscription would go much faster.

Anyone knows if that is possible?

There’s been some work on that here:

Which has been sitting there a while, so hopefully the new team from Tiny will take a look and merge it (when they start)