Best practices for MongoDB Structure


I’m new to NoSQL databases (MongoDB), and I’m wondering if I could get some specifics on best practices for when to build collections, vs. when to add more arrays to a document. I think the best way to do this is I’ll explain what I’m doing, and you guys can let me know what I’m doing wrong. Thanks in advance!

Here are my collections (as of now)

  • Users
  • trainers
  • clients

My assumptions are that trainers and clients should both be in the same collection (users), with a ‘field’ differentiating them (like ‘role: client’), which also controls their rights.

  • Tasks
  • workouts
  • reminders
  • task_data
    – lots of fields

Here, tasks are settings that a personal trainer sets for each individual client. I assume that because there’s no limit to how many tasks a trainer can assign, these should be their own collection, related to clients and trainer via ‘client’ and ‘trainer’ attributes (that store their _id). Tasks have types, and to further complicate things, if a task is a workout, it will have child documents. My assumption is that the best way to do this is with a self-join, where the child document is set to ‘type: workout-child’, and then parent_id: [parents _id].

Now here’s where it gets hairy… these tasks trigger the daily creation of task_data, notifications that are sent to the client, which they can update can say whether or not they completed the task (default = not completed), so their progress can be monitored. Now, technically, these could also be tasks, just with a different type (type: 'workout-child-task-data'). The ability to create nested attributes opens up another can of worms, as I could do something like this: data: {5/1/15: 'completed', 5/2/15: 'not completed', etc...}
Everything in the above paragraph seems wrong to me, which is why I have a separate task_data collection to handle this stuff, but thinking this through raised questions like:

  • When should I create a new collection vs. create attributes and sub-attributes
  • Are there performance issues between different ways of storing collections?
  • Is there a point when you would say “that’s too many attributes, create more collections”

Finally, task_data could also be something like a graph, which could have hundreds or thousands of data points, so… do I create task_data_data, or do I create task_data that are children of other task_data’s, or do I create hundreds of attributes for some task_data documents?

Thanks for your help!


If you need to understand well how to think about it, I recommend reading these two links:

And If you’re interest in learning MongoDB through an Official Course, check MongoDB University.
This course will start the next week - 26 May 2015: M101JS: MONGODB FOR NODE.JS DEVELOPERS


That’s perfect, thanks @eahmedshendy


Based on my recent experience, I would suggest trying not to embed documents too deeply in your collections (e.g., large documents within arrays). For your example case, make task_data a separate collection referencing a task_id.

If you are doing a lot of updating and/or filtering of embedded documents, the queries can become quite complicated. I’m finding out that mongo’s query tools do have limitations. The con on the flip side of this is repeating data via normalization, but experience tells me you will spend less time fixing bugs to keep your data normalized than you will spend testing complicated queries. As far as performance goes, I can’t speak to that, but my general attitude on performance issues is resolve them when you come to that crossroads. Hope that helps!


You’re welcome.