Database normalisation in Mongo with Meteor

I’ve just started using Meteor, and I have a philosophical question that I want to ask before I disappear down a rabbit hole that I then struggle to extricate myself from.

For my app, I have a projects collection, and each project document in the collection needs to have some (let’s say) ‘profit’ and ‘costs’ documents that can be inserted and associated with it.

I assumed that the best ‘no sql’ way of doing this would be to have the ‘profits’ and ‘costs’ each as a subdocument in an array on the project document itself. I have done this once before with Mongo.

Now that I have been reading around about it on Stackoverflow etc, I’m not so sure. I will need to perform calculations using the fields from those sub-documents, and I’d like to figure out now whether I’m doing the right thing before I proceed.

For example, it seems surprisingly difficult to return and manipulate an array of subdocuments in meteor, and I’ve already had to manually affix ids for the subdocuments because it doesn’t come as standard.

I want the app to be as reactive as possible, so am I really better off having a separate collection for ‘profits’ and ‘costs’ and just return them by having them tagged with the project id, rather than having them as sub-documents of the main project document? This is counter-intuitive to me, if you have thousands of documents, but mongo and meteor seem to be trying to quietly but insistently communicate to me that I’ll regret it if I don’t do it this way.

Any help anybody can offer me on this would be appreciated, thanks.

1 Like

This is a larger question about de-normalization in general, and I’m sure you’ll find a lot of divergent opinions about it. For me, I like relational data, because it’s easier to work with, and therefor more useful. Unique foreign keys make life easier - certain things become impossible if you don’t have a unique keys associated with your sub documents. I’ve even seen, and I agree that de-normalization is like writing directly to a cache, instead of a database.

For me, I tend to work in a single flat document as much possible, but then treat de-normalization exactly like a cache, with the primary source of truth for those sub-documents (and usually sub-setted documents) in another collection. I store whatever fields I’ll need from the foreign collection along with the unique id in the sub-document (usually as a top level indexed field, so I can select on it efficiently), and update it wherever I need to. This way I get the benefits of simple queries that de-normalization (cached sub-documents) provides and keep the benefits of properly normalized data.

That said, I don’t work with very complex data most of the time, so there’s that.

3 Likes

TL;DR Meteor’s data layer and the nature of the web platform imposes certain limitations which favor flatter and/or normalized data.

Documents with arrays of other documents can become large and difficult if not nearly impossible to paginate in an efficient way, leading to the need to send large amounts of unnecessary data to be cached on the client. Imagine for instance a post with an array of comment documents 1000 in length. Also Meteor’s mergebox only merges on the top level, which can cause unexpected results when publishing multiple result sets with nested data.

There are of course exceptions to normalization. I’ve often used a hybrid approach and stored small pieces of relevant data on the parent document so as not to have to fetch the related documents. One example of this would be storing the number of comments in the post document for easy display and only fetching the comment documents when necessary.

4 Likes

Thanks very much for the replies. I’m now a little worried that I might be making the wrong decision for my app, and once I start down the path, forever will it dominate my destiny, consume me it will.

I see that it could be advantageous with something like comments on a post, which may run into the thousands, and can be loaded or not dependent on user request - and as I understand it Meteor mitigates this further by temporarily storing a newly inserted document into minimongo on the client, which the client and server can then sort out between them in the background without any compromise of fidelity for the user.

For my app though, to utilise the ‘posts’ and ‘comments’ example, I could potentially have thousands or tens of thousands of posts, but each post will only probably have a maximum of about 10 comments on it, and fields on those comment documents will have numbers that need to be included in calculations at the ‘post’ level. My intuition strongly urges me that it would be less load-bearing to have the app fetch the one big document with the array of sub-documents on it, rather than rifle through the entire collection of comments to pick out the ten relevant ones each time.

So would it still be sensible for me to use separate collections of posts and comments, or to have the comments as an array on the post document? Does anyone know? My gut feeling and Meteor seem determined to slug this one out, and I’m very concerned that I won’t discover whether I made the right choice until it’s too late.

In SQL-land, this would probably be handled in one-to-many fashion, with each post having many comments. There would probably be two SQL tables – one for posts and one for comments. One approach might be to have each comment record contain a field pointing to the post ID. Then you can quickly retrieve all the comments with the relevant post ID.

You can easily do the same thing in Mongo.

It may be more Mongo-like to have each post record contain an array of comment ids – or even for each post record to contain an array containing all data for each comment belonging to it – but this kind of “de-normalized” data is kind of the main criticism of Mongo. There may be no reason not to use a normalized data structure, with Mongo.

Others here may provide additional thoughts/comments/info.

1 Like

There’s no reason not to do this in a normalized way in Mongo and Meteor. A Meteor publication could easily provide both sets of data, or you could use two publications, one for the post, and another for comments for that post. I do exactly this in a social media app (except I use methods for data instead of pub/sub). I do store the comment count on the post document, but I treat that like a cache, just to save an additional count query for each post.