One big collection, or lots of small collections?

Hi all, I’m building out a database and am wondering:

  • Create a Post collection, and attach PostMeta to it (Wordpress style database)
  • Use 10-15 different collections to do things (posts, likes, views, friends, etc)

I’m currently building 1 collection called posts, and can assign ‘meta data’ or extra values to the _id.

This is the way Wordpress does things, and works very well. Am I missing something here, or is this OK?

I am positive someone else can provider a better response then me but here I go.

I typically keep everything in a single collection if the amount of data isn’t too large. e.g. having likes,views against the Posts collection is more then ok but having a nested object of friends into the thousands doesn’t make sense to me. I would split the collections in this case and normalize your data.

Would love to hear what others design patterns are when designing the data tier of their app.

Thanks for the reply, and affirming I’m not crazy!

So far the app is working really well. I have user status posts and their notifications in ONE collection and subscription. I can then divide this up using a helper.

notify() {
	return Posts.find({type:"notify"});
},

See how that works?

Today I should really find out if this model works, and I think it will.

The next question I have to ask is:

Is it better to subscribe to ALL data the user could be seeing at first connection? Or does it make more sense to only subscribe to the current route the user is on? I would assume load the data only when they need it. That makes sense.

But I’ve been so wrong about the Meteor way, so I have to ask.

I guess it depends on the size of your app, but I don’t see one collection scaling well.

Your indexes will differ by collection, your schemas will differ by collection, and when I think about having another layer of abstraction on top of some of the queries I already write to target nested properties, I think you will have a painful experience sooner than later if you try to keep everything in one collection.

It’s not that your model won’t work, it’s just that there are too many advantages to sorting your docs into collections.

To answer your next question, unless you are using a subscription all the time, it makes sense to only subscribe for the data you need to be looking at. Especially if you are using the standard oplog method of reactivity. You don’t want the overhead of watching for updates on a ton of docs for every user all the time.

1 Like

If this is for a serious project that you’re planning on putting into production, you need to think about the future. When you’re first starting out, you can afford to just lump things together irresponsibly into one big collection, but when you’re dealing with potentially hundreds of thousands of documents, you need to design your DB to be fast and scalable. Do that, and you’re saving yourself a big headache later down the road.

Even if you’re just learning, best to educate yourself on best MongoDB (and general DB) practices: http://learnmongodbthehardway.com/schema/schemabasics/

Off the top of my head, if I was designing a Facebook style thing, I would have separate collections for at least your largest entities like Users, Posts, Comments, etc. You can get away with lumping metadata together, but it largely depends on the relationships you’re gonna have between your entities, how you’re planning on querying the data, how you’re going to design your publications for your user experience. Really think about it, do some research, and try have a general plan before you start serious coding, because you can design yourself into a hole if you don’t take your time with it.

Subscribe to the BARE MINIMUM of data. Only take what you absolutely need to achieve the kind of view and functionality you want, which means limiting the length of your results, and also the fields you’re querying. Any subscriptions to large amounts of data, like a list of Posts or Friends or whatever, you need to implement pagination or, if you’re doing like a search function, use operators like $regex and re-run your subscription on input change.

2 Likes

Nice tip.

I was lazy about field returns, I’ll clean that up. Great thread.

Maybe tag this under performance? I think it has useful info.

You’ll definitely find mixmax:smart-disconnect useful. It disconnects idle users so they’re not consuming resources.

EDIT: Got this from my bookmarks. It’s just what you need!

B[quote=“SkyRooms, post:1, topic:34450”]
This is the way Wordpress does things, and works very well.
[/quote]

WordPress is built on a SQL database. Mongodb is NOSQL. They are very different and trying to impose an SQL data model on a NOSQL database isn’t the right way to go. I think what you really need to do is spend some time learning about NOSQL data modeling. There’s some good literature out there. Just research and read all you can until it starts making sense. Then you’ll be able to think your own way around these problems.

Mongodb docs are a good start
https://docs.mongodb.com/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/

their blog is another good resource
https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

You can also look at what other people have done, but you’re needs may be different so you should understand the trade-offs behind different data models:
https://docs.mongodb.com/ecosystem/use-cases/metadata-and-asset-management/

Also

1 Like

Thanks for the links Max,

THANK GOD. I was relieved after reading it that I have modeled correctly lol. Thank goodness.