What will happend if i count mongo collection with more than 10M documents?


#1

Hello all,

I have a research about meteor DDP performance, utill now, I only have 10.000 documents in a collection. I have a realtime dashboard to count all data in that collections, my superior have a vision about 200.000.000 data in next year. So I need to know, how the DDP count works?

for example in my program have

var Votes = new Mongo.Collection("votes");

Template.dashboard.helpers({
    'votes_count'(e){
       Meteor.subscribe('all_votes'); //will return all votes to realtime counter
       return Votes.count();
    }
}) ;

every one document in votes have size : 121B.

So if next year we have 200.000.000 documents in our mongodb, the collection’s size will be 24GB. CMIIW

So please help explain to me how DDP works on this case (realtime counter) ? will it load all documents to subscriber and count it?

thanks,

I love meteor


#2

Yes it will. I would not recommend it. In your case I would periodically count the documents and store the results in for example a count collection. This would prevent the need to have a count operation for each client.

Meteor.setInterval(() => {
  Counts.update();
}, 1000 * 60 * 60); // Hourly update

#3

Whoa, it really scare me.

Btw, can how big mini mongo can accomodate data?

Thanks for the solution, i will try to suggest it in my research document


#4

Minimongo is limited to the amount of memory on the client and server I think. Each document published to a client is being stored into the server’s memory.

Also all documents will be pushed to the client, eating away all of the traffic and probably stalling the browser.

You can extend my solution with a slightly more complex script that manages the interval and make it only do the count while users are subscribed to the counts publication.


#5

With Meteor and livedata, you should really only publish the data to the client that you absolutely need. MongoDB can handle significantly large amounts of data, but you’re application (both server and client sides) should only access and keep the parts that are necessary for them to run.

I can’t speak of what would be best for your needs because I don’t know the architecture of your application, but there are techniques that you can use to provide the “count” that you seek without having to provide a potential 24G to the client.

As @cloudspider mentioned you could choose to run a task every so often to calculate counts and store them somewhere. If you have lots of processing that needs to happen though, this may also cause you some performance issues.

If you have a more normalized data structure, such as each vote is related to something (a vote on a post for instance) then you could try denormalizing the vote count by storing it on the other document. Then you can update it with less processing whenever a vote happens. I use this technique for like and comment counts so the complete data set doesn’t have to be provided to the client.

If it’s not something that can be denormalized in this manor, but it needs to be more realtime than updating the counts every so often, you could use something like https://atmospherejs.com/tmeasday/publish-counts.


#6

Here is an idea: server-side, create a custom pub using this.added and this.changed (in other words, don’t return the cursor as in automated pubs). Read more in the docs on pubs.

When a sub is called, your pub function does the work for you. You can even go further and re-use your counting function for all pubs. Everytime you count, you call this.changed and send it down to the client


#7

Adding further, you can make your count function smart: incrementally count only new docs since the last count. Make sure you index your DB properly and it will be super fast.


#8

In my opinion you would to use MongoDB Aggregate in server side and and not subscribe your Collection.


#9

But aggregate is expensive computationally if it is meant to be reactive.


#10

You can use Aggregate in background jobs and save results in other collections ($out).


#11

Yeah, imho the best option is to periodically count the size of the collection server-side and write the results to another collection. How you do that is largely up to you.

If your database is largely monolithic and is only going to be accessed by that specific app, then you could get away with collection hooks that increment/decrement the counter depending on the operation. Generally, to future proof your stack, I recommend having a server job that fires every few mins, gets the size of the collection, and writes it to your “counter collection”. I would avoid trying to mimic real-time reactivity by running reactive aggregations and other fancy things. The performance cost/benefit is definitely not worth it.