Efficient MongoDB comments schema

rdagger · January 2, 2016, 6:29pm

I have a react blog project that could potentially have a larger number of posts. There is a posts collection. Each post can have a large number of comments. What is the best approach to storing the comments?

Add a comments field to each post that is an array of comment objects.
Have a separate comments collection. Each document would contain a post id and an array of comment objects.
Same as #2 but each individual comment would a be its own document.

harisvsulaiman · January 2, 2016, 6:38pm

Comments collection and the post collection with a field specifiying comments collection id.

just a thought

generalledger · January 2, 2016, 6:41pm

That is the way I would do it. I think that we’re going to see more and more mongo databases organized like relational databases, esp. now that Mongo 3.2 supports relational type queries with $lookup .

rdagger · January 2, 2016, 6:55pm

There is also a 4th approach that looks good. A hybrid of 2 & 3. A separate comments collection where multiple comment objects are stored in buckets. For example, each bucket document would have a post ID and could hold an array of 20 comments. A post could have multiple comment buckets associated with it.

pablohirafuji · January 2, 2016, 8:37pm

Really nice ideia. How can be done? Here is what I thoutgh:

Inserting:

commentBucket = Comments.findOne({postId: postId, commentsCount: {$lte: 20}});
If find something, push the new comment in comments array and increase commentsCount by 1, probably update lastAddedAt field for sort buckets of comments by date;
Else insert a new document with postId, comments array with new comment and commentsCount: 1;

Not sure if it’s the best way, and the only efficient sort is by Date.

rdagger · January 2, 2016, 8:52pm

The problem with 1 & 2 is that there is a limit on total comments because of the document size limit. There is also the issue of storage fragmentation. 3 would be the most versatile but at the cost of performance. The hybrid approach avoids the document limit and still provides good performance.

Also, I think it would be easy to implement with respect to pagination or infinite scroll. Instead of a comment limit you could have a minimum bucket. For example, display all buckets greater than X. Then lower then number as the user scrolls until you get to the first bucket.

Threading would be complicated, if required. I prefer the flat approach.

shock · January 2, 2016, 9:54pm

I would also suggest flat approach instead of bucketing, as it does not help at all from my point of view.
Still it is relational data and that is best saved in DB which supports relations.

But there is that standard question - do we really need oplog tailing and realtime reactivity for this usecase?
Would not be better to use some SQL/graph for this model?
Especially when you would want some fast generated feed of likes/comments in various user’s personal feeds?
I would suggest graph DB as Neo4j if you want to go the social network way.

MongoDB is nice, performant but “dumb” document store. And it is doing it’s job very well.
But if you are planning on heavy relational load from start, I would suggest you consider something build for that purpose.

kako · January 2, 2016, 10:33pm

The Designing your data schema part from the Meteor Guide illustrates exactly your situation. The example has a Lists collection (equivalent to your Posts) and wonders whether todo items (equivalent to your Comments) should be a field in Lists or a separate collection. This probably gives you a good direction:

The most important consideration is related to the way DDP, Meteor’s data loading protocol, communicates documents over the wire. The key thing to realize is that DDP sends changes to documents at the level of top-level document fields. What this means is that if you have large and complex subfields on document that change often, DDP can send unnecessary changes over the wire.

The issue with this schema [todos (comments) as a field inside lists(posts)] is that due to the DDP behavior just mentioned, each change to any todo item in a list will require sending the entire set of todos for that list over the network. This is because DDP has no concept of “change the text field of the 3rd item in the field called todos“, simply “change the field called todos to a totally new array”.

The implication of the above is that we need to create more collections to contain sub-documents. In the case of the Todos application, we need both a Lists collection and a Todos collection to contain each list’s todo items. Consequently we need to do some things that you’d typically associate with a SQL database, like using foreign keys (todo.listId) to associate one document with another.

And if there will be a great number of comments and you’re worried about performance, an index for the Comments collection should solve that.