Subscribe on many documents vs large documents performance issue?

dolle39 · October 13, 2019, 11:03am

I have run into some trouble with my meteor application.

Previously I had a few documents containing alot of data. In order to add some flexibility I have changed the data structure so that much of the data in the previous large documents have been put into a separate collection now containing many documents.

However, after doing this change my subscription cant even complete loading! The CPU of the server goes up to 100% and then the DDP connection resets and the subscription starts over again.

In order to verify that this was indeed a problem I created a test app. I then created two collections, one called LargeDocs and one called ManyDocs. The total size of both these collections should be approx the same. In ManyDocs I create 40 000 documents with some random data. And in LargeDocs I only create 40 docs but with size 1000 greater per document than ManyDocs.

I then compare then compare the subscription time for the two versions. I have also tested with different versions of Meteor and this is the results:

Total size large docs  =  10197857 chars
Total size many docs =  10240335 chars

**1.6.1 - LargeDocs [ms]**
1905 
1589
1544
1754


**1.6.1 - ManyDocs  [ms]**
7565 
7472
7504
7104


**1.8.1 - LargeDocs [ms]**
2737 ms
2802
2812
3030


**1.8.1 - ManyDocs [ms]**
Fails to subscribe. The DDP connection resets and the subscription is restarted

So for meteor 1.6.1 the total subscription time is 4 times larger when moving data into another collection with many documents.

I know there is a performance issue with meteor 1.8.1 so that is nothing I want to address in this question. I just want to understand why there is a difference when subscribing to many documents vs large documents when the total data still is the same? Any ideas?

znewsham · October 13, 2019, 1:51pm

There are a couple of potential options here.

The most likely culprit is that the server has to track every document that every session has, and which fields of the document it has. This takes significant resources.

The second option is that 40000 small documents can be substantially larger than 40 documents, 1000x larger, depending on what is being stored. (e.g., just an id).

It’s a fairly unusual pattern to need to send 40000 documents to a single client. If you need the flexibility of tracking the individual documents on the server, you could try aggregating these on the server and sending fewer larger documents to your client

dolle39 · October 13, 2019, 7:06pm

@znewsham : No in this comparison the total size of the two collections ManyDocs and LargeDocs is excatly the same. I have checked the total data amount sent to the client to verify that both cases have exactly the same amount of data. So that is not the reason for this.

Why does it take significant more resources to track 40000 documents? With OPLOG tailing meteor should only have to do work when a changed is identified on any document and for this test I dont make any changes to the collections. And even if it did, the update should be done in O(1) I presume.

znewsham · October 13, 2019, 8:10pm

It’s not about the changes, it’s about the work the server needs to do, to know if it needs to care about the changes. 40000 entries into a map, instead of 40 takes significantly longer, not to mention the work done to check if the client already has the document.

Also, regarding oplog tailing, the server has to care a little about every change. Regardless of whether a client has the document being modified, because for every change it only knows if it cares by checking whether one of its clients has the document. This takes a finite amount of time per change. There is also a cost to de-serializing the change notifications to begin with. This is why redis-oplog is a thing. I don’t think it’s relevant in this case, because you aren’t modifying the documents. But it’s worth noting

dolle39 · October 14, 2019, 11:36am

@znewsham I think I have identified the bottleneck here.

From what I can see in ddp-server/livedata_server.js it seems that the server is sending a DDP message for every single document! So it is in fact network traffic which is the problem. If it would be possible to batch all messages into 1 big message then the subscription would go much faster.

Anyone knows if that is possible?

coagmano · October 15, 2019, 12:07am

There’s been some work on that here:

github.com/meteor/meteor

Batching Oplog Entries & DDP messages

meteor:devel ← KoenLav:re-apply-ddp-rework

opened 11:38PM - 04 Mar 19 UTC

KoenLav

+409 -153

This PR implements a couple of things: - DDP now supports sending an Array of m…essages, rather than a singular message; - In the Session on the Server, the messages for every individual client are buffered (if additional messages are received within 10ms, the buffer fills up until at most 1000 messages have accrued or 500ms has passed); - In the Crossbar on the server OplogEntries are buffered for every (potentially re-used) ObserveHandle (if additional entries are received for the same ObserveHandle within 5ms, the buffer fills up until at most 1000 entries or 500ms have passed). This decreases the load on server setup for two reasons: - when messages are buffered inside the Session on the Server there is less overhead while stringifying individual messages and sending individual messages over the WebSocket; - when messages are buffered when they are received in the Crossbar we give the server a chance to recognize the fact it is falling behind too far (and possibly fall back to polling). To see the effect of this PR I would recommend cloning this repository (https://github.com/koenlav/meteor-fiber-repro) and comparing the performance with the development branch. ===== This PR supersedes both https://github.com/meteor/meteor/pull/9862 and https://github.com/meteor/meteor/pull/9885, but does not (yet) achieve the desired end-result from https://github.com/meteor/meteor/pull/9876. Also it does not incorporate the changes from https://github.com/meteor/meteor/pull/9622, as this PR appears to be stale. While this PR makes sure we buffer at the start and end of the process from MongoDB => socket, it is likely to be possible to refactor all classes involved in between to accept 'messages' (rather than added, changed, removed), which would allow us to make use of the full potential of these buffers; we would only need to unwind the array of messages in one or two places on the server.

Which has been sitting there a while, so hopefully the new team from Tiny will take a look and merge it (when they start)