Nested vs separate collection

Hi!

I have a general DB related question. It’s more specific to how collections are handled with respect to MongoDB.

Let’s say I have a Parent collection. I then have some Child collection that is apart of Parent. They each have separate schemas. Currently, they’re existing as separate collections in the DB though.

I’m handling the linking by adding the parentId property to every Child document.

I.e. Some_Child = {
“parentId” : “some_id”,
rest_of_schema
}

This seems to work just fine. However, I’m noticing I now have to work with two collections everytime I want Child data. This can lead to more code. I.e. Multiple Cursors, and DB calls for everytime I just want to do something with Child.

What are some thoughts on structuring the data this way vs. just having an array of Childs on each Parent document?

I.E. Some_Parent = {
“Childs” : [
{child1},
{child2},
{childN}
],
Rest_Of_Schema
}

My concern with this is about it being future-proof. Say if a lot more data and functionality is needed for Child…then Parent documents could end up being really large and messy. Also, it might just be cleaner to abstract away these two Collections in general.

Normally, (in a RDMS), I wouldn’t of even thought about using option #2. So I’m just wondering if this is an accepted pattern with document stores, MongoDB (and just general non-relational DBMS), and Meteor.

Any insights?

Shouldn’t need multiple subscriptions, you can return 2 cursors from your publication.

Meteor.publish('foo', id => {
  const foos = Foo.find({ _id: id });
  const bars = Bar.find({ parentId: id });
  return [foos, bars];
}); 

As a general rule, if I have a collection of data that I need multiple references to, I’ll store it in another collection. If It’s a 1 to 1 relation, then I’ll just stick it in the parent itself.

If I have to update the child collection, I only want to update it in one place.

Ack…I actually meant returning multiple cursors, not subscriptions.

Mm…I see your points. Makes it easier to just update data in one place for 1:1 relation. Thanks!

If you want the simplest possible solution, then publish-composite will probably be the easiest solution. It won’t be the most performant but works well to get your project off the ground. If you want a good solution with better performance characteristics, you’ll probably want to set up grapher.

The nested sub-doc is the ‘correct’ way to go about it; they’re referred to as ‘embedded documents.’ That’s what Mongo is optimised for. It’s not a relational DB, it’s not designed to join across collections in the way that an RDB does, although you can do it with aggregate queries, if you really need to.

But yeah, the embedded pattern might feel a bit weird if you’re used to relational, but it’s what Mongo is built for, UNLESS your embedded documents are going to get very large. I believe there’s a 16MB size limit for a single document.

Partially true, but it doesnt mean that you should not flatten your collections to a certain extend. A document store like mongo simply expects that you handle the joining part on the consumer side, meanong your app is responsible for joining up data.

I always take a general rule of thumb. If the nested object represents an entity in my domain model, it earns the right to exist in its own collection. An entity is a thing that can life on its own. For example books and authors. 1 author might have a sub array of books. Is there a need to query books without author? Is there a need to go to a book detail page where other related content might be? (Book has relations on its own).

These questions matter for your database model, because if one of those answers is yes, than you might want to store it as a separate collection.

What I often find myself doing is storing data in a normalized fasion. A framework like apollo helps me fetch all related books whenever I ask for it. Same goes for Minimongo and pub sub, you could simply request an author and subscribe to all related books using an $in selector.

2 Likes

^ yes, good points. The impression I got from OP was that the embedded documents are only ever fetched in the context of the parent. But yes, generally that’s a sensible approach to determining how to model your data.

Thanks, exactly what I was looking for

The problem with this is that Meteor’s data system does not work well with embedded documents for several reasons. Because of this, if you are using live-data you’ll probably want to think about normalizing where possible and only denormalizing parts that make sense for performance reasons.

Even if you aren’t using live-data, you can use cultofcoders:grapher with non reactive queries to fetch your normalized data in a super performant way that is actually in some cases more performant than equivalent SQL queries.

Heavy use of embedded documents so very often leads to unforeseen headaches in the future.

1 Like

Do you have any specifics about what doesn’t work well in Meteor with embedded documents? Does it not do a “deep” update on nested data coming from an API or something?

Let me give you an example.

You have 2 publications:

Meteor.publish('FooA', () => {
  return Foo.find({}, { fields: { "details.name": 1 });
});

Meteor.publish('FooB', () => {
  return Foo.find({}, { fields: { "details.address": 1 });
});

If you subscribe to FooA, and then subscribe to FooB, details.address will not get merged into the existing minimongo collection because Meteor can only merge top level fields.

4 Likes

Ah, important to know. Will keep in mind. Thanks so much for the detailed example

Yes. Also there doesn’t exist any way to publish subsets of embedded documents and so for example, if you had a post with 1000 comments stored in an array on the post document, you would either have to send all of them, or none of them.

Lastly, if you are using Blaze, there are optimizations in place when iterating over a MiniMongo cursor that don’t exist when iterating over a normal array. This one is of course moot if you use other view layers.

All great points. I’m learning from these varied responses that there’s really no defacto go-to answer here. Really depends on you data and even then you can never truly predict how the decision will affect the future of the application

Also this:

Designing your data schema
The key thing to realize is that DDP sends changes to documents at the level of top-level document fields . What this means is that if you have large and complex subfields on document that change often, DDP can send unnecessary changes over the wire.
… DDP has no concept of “change the text field of the 3rd item in the field called todos “. It can only “change the field called todos to a totally new array”.

We patched ddp-server to handle this: https://bitbucket.org/znewsham/meteor-ddp-server/src/master/

I’m not sure this is correct - so long as the documents you’re iterating over have an _id field I think the same optimisations are present

Is this a change that has been merged into Meteor?

I wasn’t aware that this was the case. If I’m not aware of this, then I’m sure most others are not either since it is not documented anywhere, and therefore their embedded documents would not have _id fields and thus not applying these optimizations.

Again this is moot unless Blaze is the view layer in use.

1 Like

From Blaze docs:

Reactivity Model for Each

When the argument to #each changes, the DOM is always updated to reflect the new sequence, but it’s sometimes significant exactly how that is achieved. When the argument is a Meteor live cursor, the #each has access to fine-grained updates to the sequence – add, remove, move, and change callbacks – and the items are all documents identified by unique ids. As long as the cursor itself remains constant (i.e. the query doesn’t change), it is very easy to reason about how the DOM will be updated as the contents of the cursor change. The rendered content for each document persists as long as the document is in the cursor, and when documents are re-ordered, the DOM is re-ordered.

Things are more complicated if the argument to the #each reactively changes between different cursor objects, or between arrays of plain JavaScript objects that may not be identified clearly. The implementation of #each tries to be intelligent without doing too much expensive work. Specifically, it tries to identify items between the old and new array or cursor with the following strategy:

  1. For objects with an _id field, use that field as the identification key
  2. For objects with no _id field, use the array index as the identification key. In this case, appends are fast but prepends are slower.
  3. For numbers or strings, use their value as the identification key.

In case of duplicate identification keys, all duplicates after the first are replaced with random ones. Using objects with unique _id fields is the way to get full control over the identity of rendered elements.

1 Like

It has not

I’m not sure where I heard about that being the case if not from the docs, but it certainly is, we use it to aide with animations, we render a basketball shot chart, and when the list of shots changes, we animate the new shots, this is done by just setting a numeric _id so when the list changes, the existing dots move to new locations.