Mongo: Many small collections vs. a few large collections

mz103 · January 7, 2016, 11:47pm

In Mongo, what is more efficient: many small collections, or a few large collections?

For example:

Comments on news articles and comments on photos would each have their own collection (NewsComments, PhotoComments)
Or just one big Comments collection, with a field that specifies whether it is for news or photos.

robfallows · January 8, 2016, 2:46pm

Define “efficient” …

GabrielGM · January 8, 2016, 3:17pm

As always, when we talk about “efficiency”, it depends on your use case. There is never a rule to follow.

Are there a lot of reads?
- How much data do you read at the same time (filters)?
Are there a lot of writes?
- Do you insert only or also update collections?
…

You should always test the method you prefer (single collection or multiple collections).
Don’t forget to use indexes!

On a personal note, coming from a relational database world, I prefer small “normalized” collection. But again, it’s not always the best.

jamiter · January 8, 2016, 3:17pm

I would go for one collection with every comment looking something like this:

[{
  _id: "123123",
  text: "I really like it",
  photoId: "123123",
  userId: "123123"
}, {
  _id: "123124",
  text: "I hate it",
  articleId: "123123",
  userId: "123123"
}]

photoId and articleId are both optional.

Why only one collection? Well, if you want all comments from one user you will only have to do one query. And with an index on photoId and articleId you can still get the comments of a specific photo or article in an efficient way. (Yes, you can have a lot of discussion about this, but this would be my personal approach).

I find multiple collections for practically the same thing only useful in rare circumstances. Or if they aren’t as identical as what you are describing.

Hope this helps!

GabrielGM · January 8, 2016, 3:25pm

Agreed in this case (photo comments and message comments are just “comments”).

But the question is broader (I think):

Again, case by case…

jamiter · January 8, 2016, 3:37pm

@GabrielGM, totally agree. Looking at the main question the answer is “it depends”, as almost always.

Looking at the examples (which are probably very close to what the OP is actually trying to achieve) I would go for the approach I mentioned. It’s up to @mz103 to still think about reads / writes / efficient queries / efficient indexes / etc.

benja · January 8, 2016, 4:46pm

I think in your case you should have one big collection for comments.

You could also embed your comments inside the news/image document if you do not need to do updates very often.
See - https://docs.mongodb.org/ecosystem/use-cases/storing-comments/#embedding-all-comments

A couple of weeks ago we were trying to decide how to handle comments and their replies. We were debating whether or not we should give the comments their own collection or embed them into the image document. In the end we decided to give comments their own collection and to embed the replies inside the relevant comment document.

Since I am bringing up embedding you should know about some issues that can cause(as we have learned in the past few weeks).
The big advantage is that 99.99% of the time mongo only has to load one document to get a comment and all of it’s replies, the problem comes on those rare occasions we need to update multiple replies based on the same search parameter (for instance a user changes his name).
Unfortunately at present mongo can’t update multiple subdocuments based on the same search parameter(it will update the first one only see - https://jira.mongodb.org/browse/SERVER-1243).
To work around this you need to write some logic in your app layer that will go through the embedded documents and update them accordingly(we use mongo bulk updates to do this). This of course can be very heavy depending on how many documents you have to update.
So again, it all goes back to how often you will need to be doing updates vs reads.

Hope that helps.

benja · January 10, 2016, 7:31am

Just want to add to my last comment there are some issues in how meteor handles updates to embedded documents

which also needs to be taken into consideration