My 2¢ about nested docs and fields

serkandurusoy · August 23, 2015, 9:07pm

My OCD genes almost always push me hard to make things tidy. One outcome is to design my schemas so that related information are grouped together into objects and/or arrays and sometimes a “subdocument” structure is used for those data that could otherwise have been designed as a collection on its own and joined. Of course trying to avoid complex joins is almost always an istinct while working with mongodb.

I’m also very picky about the packages I use. When I pick one, I try to use it to its fullest. Otherwise, I just look at the source and copy over or recreate that functionality in my app on my own terms so that I can tweak it the way I want.

For this last app I’m working on, and it is a considerably large one, I’ve decided to bite the bullet and go all out with packages. There are a number of them in the app, but the ones I’m particularly relying on are:

cfs gridfs
aldeed simple schema
aldeed collection2
dburles collection helpers
reywood publish composite
matb33 collection hooks
zimme collection behaviours
zimme softremovable
zimme timestampable
mickaelfm vermongo
ongoworks security

Now you see where I’m getting at. These are all collections/publications related. They form the backbone of the app and they need to be solid.

First things first, they are solid! Very! And I’m happy I chose them. I invested considerable time in picking them apart, making sure they will not break on me and so far, so good. I feel deep gratitude for those smart people who have created and contributed to these packages.

There is one slight problem though.

These magical packages lose their magic when it comes to nested fields of any form, be it objects or arrays. Especially the decorators like timestamps, versions etc rely on the top level fields.

I know well now by experience that Meteor’s fine grain reactivity and the DDP server diffing algorithm works on the top level fields, so I’d already developed an eye for where I should grant schemas their own collections, from this point of view.

But I was kind of caught blind sided by the fact that these packages also do rely on their input being top level fields.

Well, I kind of digested this rather quickly and made compromises where necessary. Again, so far so good.

But, today, I’ve spent the better part of my Sunday evening (I know, I know) trying to fit all these into the accounts system. I did not fail miserably, but I did fail. There are some moving parts that don’t work quite the way I want them to, due to email addresses being nested in the emails object and the accounts system creating its own createdAt field, both of which I’ve figured out how to work around.

Well now, apart from the fact that I’ve always wished for more flexibility regarding the users schema (objectid problem being my top annoyance) this post is not about my beef with the accounts system.

It is about how we design schemas for our apps and how it may or may not come back to bite us. I want this experience to be out there for any one to read and take away anything they might consider valuable.

Of course our trade is all about trade offs and engineering is a discipline where we practice practical optimization rather than theoretical perfection.

So here is my 2¢; just double consider when you are designing schemas that contain nested fields/documents and arrays.

mordrax · August 23, 2015, 11:18pm

*anyone is one word.

On a slightly more relevant note, I’d like to know if this ‘reactive on the first level’ is by design or by technical constraint and if the latter, is it going to be addressed in the future. I imagine this issue is only applicable to document databases.

sashko · August 24, 2015, 3:35am

It’s kind of funny that the top-level diffing thing is not at all a problem when publishing data from SQL, where everything is in flat rows!

arunoda · August 24, 2015, 4:32am

Ha ha!
So, never need to worry about it in SQL

serkandurusoy · August 24, 2015, 10:44am

@mordrax I think it is more performance oriented design decision since by the looks of it, the diffing function pulls in the complete document, but only compares the top level.

It is in the ddp server package, therefore you can clone that into your app and change it I guess.

serkandurusoy · August 24, 2015, 10:53am

That’s right, but you know what, after almost 20 years with SQL, I finally got the chance to actually use nosql with meteor and mongodb and grown to like it.

And @arunoda, SQL will bring in a new set of worries of its own. I can’t forget all the (practically failed) efforts of the SQL world in trying to achieve hierarchical schemas with XML databases, property sets, object db’s, orm magic, envy of the past days of as/400 etc.

It would be great if meteor on mongodb had joins and some syntactic transform sugar like Collection.find({}, {fields: {address.city: {as: 'city' }}})

I guess the point is, whatever we choose, given we do that wisely, we still need to make compromises and it’s good to know what we’ll need to compromise before hitting it as a self-inflicted wall.

shock · August 24, 2015, 11:01am

I still dont understand what exactly do you mean by top level document only diff.
Something noticable when working with standard collections, or there is something specific about composite, that it does not react on 2ndary object changes ??
I did not worked with composite yet.

serkandurusoy · August 24, 2015, 11:13am

@shock Say you have a posts document where you also store an array of comments:

post = {
  title: 'My First Post',
  author: 'Me'
  body: 'Some interesting post'
  comments: [
    {
      author: 'Me',
      body: 'Post your comments'
    },
    {
      author: 'You',
      body: 'Nice post'
    }
  ]
}

Now imagine you are on a “comments” page where you are displaying your comments and a new comment showed up.

Normally, if comments had its own collection, a new comment showing up would be a single comment document getting pushed to the client and the page inserting a div.

But since you have a nested comments array:

whenever there is a new comment, the whole comments array is pushed to the client
whenever anything else changes on the post (perhaps the post itself gets an upvote or edit), the whole comments page gets a signal that the underlying data changed, even though comments itself has not changed

arunoda · August 24, 2015, 11:30am

Yeah! I know, I just replied to sashko’s comment
I know there are issues.
That’s why people moved to other DBs at the first place.

But, I must admit that specially Postgress has evolved in recent years.
Scaling MongoDB is also not smooth as it seems.

vjau · August 24, 2015, 10:00pm

Yeah, NoSQL is very cool, it has no schema ! You can do what you want ! But after a little while, and the app growing, it really seems that a schema written in stone could be useful, so let’s add that back ! Could we implement a way to enforce a schema in MongoDB ?
Yeah, NoSQL is very cool, it has no relations ! You can do what you want ! But after a little while, and the app growing, it really seems that some fixed relations betweens table could be useful so let’s add that back ! Could we implement a way to get relations in MongoDB ?
Yeah, JS is very cool, every variables types are dynamic ! You can do what you want ! But after a little while, and the app growing, it really seems that some security on types could be useful so let’s add that back ! Could we implement a static type system in JS ?

In less than two years, i think SQL will have done a really big comeback in JS, and Typescript’s type system will be a serious proposition for ES8 or whatever it will be called at that time.