MongoDB schema strategy for users joining a group

sergiotapia · January 9, 2016, 1:13am

Imagine Facebook’s group feature, using MongoDB. What would be the best way to structure the data?

Main operations on this data will be find users who belong to the group and determine if user X is a member of the group.

One option would be to have an array of strings in the Group document.

{
  name: "Pets United",
  members: [
    "iuahsdfuhasdfasdf",
    "qwefqwefqwefqweff",
    "ioeroigkergnknmkm"
  ]
}

Another would be to have a membership array of Group id’s in the User document. This option makes sense, but I’m not sure how easy a query would be to get all users who have group id foobar in their array.

{ 
  firstName: "Sergio",
  age: 27,
  groups: [
    "iuahsdfuhasdfasdf",
    "qwefqwefqwefqweff",
    "ioeroigkergnknmkm"
  ]
}

Which one do you guys think is better if I expect a group to have thousands of users?

sergiotapia · January 9, 2016, 1:31am

Looks like the second option would be the best approach here unless I’m misreading the documentation.

https://docs.mongodb.org/manual/reference/operator/query/in/#use-the-in-operator-to-match-values-in-an-array

So a query to find user’s who joined group foo would be:

db.users.find( { groups: { $in: ["foo"] } } )

sashko · January 9, 2016, 2:25am

I think it’s basically equivalent. Also, the query above can be shortened to this:

db.users.find( { groups: "foo" } )

You only need $in if you have an array of groups and you want users who are in one or more of them.

sergiotapia · January 9, 2016, 2:47am

Thanks @sashko always appreciate your help.

serkandurusoy · January 9, 2016, 11:42am

If you are going to need to query both for a user’s group and a group’s users and will do that in more or less the same frequency, I’d say go for both of them and keep them in sync.

But I’d imagine, as your app data grows and you have hundreds/thousands of users for each group, you’ll be needing to query for a user’s group more freuently than you do for the users of a group.

If you find yourself needing both queries frequently, you may seek solutions like peerdb (which keeps relations in sync for you) or perhaps neo4j.

willemx · January 9, 2016, 1:54pm

What aboat a third option: a separate membership collection. You could also store membership properties there, like role(s) or permissions in Group, start/end of membership, etc.
I think this scales much better. You can have indices on all properties.

serkandurusoy · January 9, 2016, 5:22pm

Those many to many relations force additional queries at all times.

Personally, as soon as I switched over from SQL to NoSQL, many to many relationships were among the first I ditched.

sashko · January 9, 2016, 5:56pm

Yeah I think the array approach is better in Mongo just to avoid doing lots of queries all the time.

jitterbop · January 13, 2016, 10:04pm

Here’s a solution to consider.

{  
firstName: "Sergio",
age: 27,
followers: [ { "user2_id" : { name:Sashko, groups: [ 'PetsUnited' , 'PetsDivided' ]  } } ],
groups:    { 
           "PetsUnited": {"user1_id": {name: Sergio}, "user2_id": { name: Sashko} }, 

           "PetsDivided": {"user1_id": {name: Sergio}, "user2_id": { name: Sashko}  } 
       }
}

In the above solution, the userId (user1) is base64-encoded so the id can be used in queries and updates (e.g., groups.PetsUnited.user1_id). The approach is laid out nicely here

sergiotapia · January 13, 2016, 10:06pm

How would you deal with a famous person joining your website? Imagine Kim Kardashian creating her profile and having 10 million followers. MongoDB has a hard 16MB document size limit. It would be better to extract followers into a separate collection.

sashko · January 13, 2016, 10:32pm

I think it’s reasonable that your schema could change once you have a user with millions of followers. Optimizing for that ahead of time would be a mistake, IMO.

sergiotapia · January 13, 2016, 10:34pm

Oh totally, but I don’t think using a separate collection for followers is such a drastic leap. Just a little tweak to make things smoother for you.

jitterbop · January 13, 2016, 10:58pm

I think a separate collection works. I’ve seen recommendations for two separate collections, one for followers and another for followings where the _id fields are reversed. This would probably require a little extra on the operations side but it’ll lead to more flexibility and better scaling. I think the most interesting part of this solution is base64-encoding the userId field. Apparently drilling for mongo _id in deeply nested doc is challenging; nevertheless, I haven’t seen this approach recommended in meteor community. I’m a newbie, but I pay much attention to the model layer.

vigorwebsolutions · February 17, 2016, 10:59pm

@sergiotapia I know this thread isn’t too old, but just wondering if you had a use case for bringing this idea up and which route you ended up taking. Trying to decide between these two structures right now as well…

sergiotapia · February 18, 2016, 6:30am

Hi @vigorwebsolutions, I ultimately went with the second option.

I’m mostly interested in checking whether a user has permission to post in a group. So having a string array of the groups he’s a member of is simple to query.

And for listing out member of a group, I can also query like:

// Users who belong to group 'foo'.
db.users.find( { groups: "foo" } )

Associate using the group id, not the group name though.

vigorwebsolutions · February 18, 2016, 6:47am

That is what I am leaning towards as well. I have a count field on the group document that I $inc +1/-1 when members join/leave, so that I don’t have to query all of the users to get a member count on each group. Just curious if you have any other fields on the group document that make life easier?

sergiotapia · February 18, 2016, 6:50am

Yeah, I also have that counter cache that is updated whenever someone joins or leaves a group, that’s a definite good idea. Other than that, I don’t have any other helper fields.

vigorwebsolutions · February 18, 2016, 6:54am

Cool, thanks for the feedback!