Subscription bombs out after >50k documents

ggerber · December 7, 2015, 6:24am

Hi,
I have an app that uses iron:router and has autopublish removed. On the server I generate 100k documents. All documents are published to the client. The client subscribes to all documents.

While the client is updating I call documents.find().count() in the browser console every couple of seconds to see how the data is loaded. I notice that at about 50k documents the subscription (gets refreshed/the connection breaks) and the client starts again with 0 documents. The client fails to pass 50k or so documents for two or so attempts (long waiting time) and then eventually succeeds to load all 100k documents.

Obviously I would like to load all documents in one go (this is on a development machine at the moment).
Are there any gotchas I should be aware of when subscribing to >100k documents?

ggerber · December 7, 2015, 10:51am

Hi,
I have increased the number of documents to 500k. On the server I can call db.documents.find().count() which gives me 500k.

In the client webconsole I get 0 documents. I see in the linux process monitor that the process ‘node’ uses 1.7GiB and that my processor is running at 95% for the last hour. So the wheels have come off somewhere. The process ‘mongod’ uses 38MiB (0% cpu) and firefox uses 260 MiB (1% cpu).

Could this be minimongo not being able to handle so many documents?

lucfranken · December 7, 2015, 11:10am

Question is off course why would you want to show 100.000 documents to a user on a client?

serkandurusoy · December 7, 2015, 11:21am

Are you sure the browser can handle that much data? Basically a client should only have as much data as the user will actually see or benefit from the presence of regardless of the technology, be it meteor or something else.

Or perhaps are you running this as a benchmark, but than what are you benchmarking exactly?

If you ate benhmarking minimongo, you can do it on the server as well so that you don’t get biased by the browser.

Or if you are trying to benchmark pub/sub you can create a standalone ddp client.

vjau · December 7, 2015, 11:56am

You have to find a way to paginate your data. You user is never looking at 100k documents simultaneously, so the is always a way to paginate.

ggerber · December 7, 2015, 12:20pm

Hi,
I am busy with a mapping application, where each document represents a
waternetwork node. So for a large city/town 100k valves etc really can
exist. Mongo says the collection is about 10Mb. So it doesnt seem to be a
huge dataset.

serkandurusoy · December 7, 2015, 12:25pm

Well, in that case, your solution would be marker clustering. Since no human will be able to see 100K markers on a map the size of a computer screen, you cluster closeby markers together to show a single one.

Check this out for the concept and some bried technical introduction:

https://developers.google.com/maps/articles/toomanymarkers?hl=en

kenken · December 7, 2015, 12:25pm

Actually i hit a similar question on sending large amount data to client for analytic purposes. So all the data are indeed needed for calculation. It takes almost 2mins+ for loading the data…

Is there any option for pub/sub being faster? 2min+ is actually not acceptable on UX point of view.

serkandurusoy · December 7, 2015, 12:27pm

@kenken the solution to doing in-browser-analytics would be to send down a concatenated complete data set as a single json document and then use your analytics toolset’s methods on that json data.

kenken · December 7, 2015, 12:27pm

I found out that was what i DID…still slow… Any other option?

none · December 7, 2015, 12:29pm

But a cluster of markers still need to get all the documents…

serkandurusoy · December 7, 2015, 12:29pm

@kenken How so? sending down even megabytes of json from your server to the client should take no more than a couple of seconds. But the thing is, pub/sub is not the right transport there. You should pre-populate aggregate queries on your server and then send then down as files.

kzvaigzne · December 7, 2015, 12:30pm

It’s never smart to ask browser to do anything that computation heavy… you tell the server what you want to see or how to get the final data, and then you show that small final data.

serkandurusoy · December 7, 2015, 12:30pm

@none you could do the clustering on the server. in fact you should. that’s how google maps etc work as well.

shock · December 7, 2015, 12:31pm

Most of the time people dont use subs.ready() and re-render all for every document added to minimongo collection.
Than stories like this happen. When you are recomputing results after each of 50k documents arrived.

kenken · December 7, 2015, 12:32pm

I am using a server method to create the JSON object after calculation, then send back to client. But the performance is not good. Maybe I am missing something…

serkandurusoy · December 7, 2015, 12:34pm

@shock OP’s problem is not about subs.ready() because he’s doing a simple pub/sub so it is done automatically for him. Also, I don’t think the problem is rerendering either because only the changed data would have to be rerendered where there is no changed data in that experiment.

@kenken did you time (separately) how long it takes to

create json
send it down
render

that will tell you where you need optimization

kenken · December 7, 2015, 12:34pm

I am just do it right now. Thanks for guiding.

none · December 7, 2015, 12:38pm

I didn’t understand about “sending json file”…
If I’ll send json, it will be nonreactive…

shock · December 7, 2015, 12:39pm

I have learned that you should not expect anything here on forums unless code is provided
If I want to test it, I would create empty app, populate mongo, and call subscribe by hand from browser console.
Than check collection .count()

But as usual, people use that data in template helpers to render in #each and than even dont subscribe correctly.
So it is not counted as re-subscribe, so it is doing unsubscribe and later subscribe invalidating whole DDP merge box.

I would believe none of this is true if I see meteorpad