Seeking advice on alternative to mongo distinct queried from client side, since it is not available currently

My data is a bunch of individuals with a “department”. Every department has n sub departments. The sub departments have random names. But the subdepartments/subsections inside each department are always the same.

I have a collection full of stuff like this:

userId: 1
displayName: joe
department: sales
americanClient: disney
europpeanClient: siemens
australianClient: kanguru

userId: 3
displayName: mary
department: sales
americanClient: ford
europpeanClient: bmw
australianClient: koala

(in this case american, european and australian client are the subsections)

userId: 2
displayName: sammy
department: banking
mainfunction: "client handling"

userId: 3
displayName: mary_banking
department: banking
mainfunction: "client handling"

in this case main function is the only subsection.

Then I have another collection that lists the subsections of each department:

department:banking
sections:[mainfunction]

department: sales
sections: [americanClient, europpeanClient, australianClient ]

I have a million of these users, and i want to produce a list of the available distinct departments and sub departments. Mongo distinct is not available directly, and rawCollection distinct only works server side.

What I want to happen:
As my webpage, I want a list of departments, then I click on “sales” and it produces 3 menus, one showing all existent European clients, one showing all existent American clients and one showing all existent Australian clients.

As I click these specific clients, I will produce a list of the users connected to them.

How do i do this? Since the list of europeanclients exists only inside the user list, the obvious way for me is to use distinct.
Since that is not available, I was doing a find for all, then turning that into a set. But that takes too much time because I have too many users.

Is there a way I can alter my “schema” to go around this?

Each subsection/subdepartment is just a string, nothing else, so I don’t have a collection for them.

Help would be most welcome :slight_smile:

With that many users that’s going to be a huge query. You should probably use something like _.uniq (from the underscore library or the lowdash equivilent) and denormalize that data into a new collection.

@gazhayes if he uses _.uniq() function, the query still going slow, because the function is executed on the server, and not on the mongo query, don’t it ?

I definitely go with multi collections, it’s the best way I think.

Yes but he only needs to run it when new data is added (and populate/update another collection) rather than every time the data is retrieved. It could even be done outside of meteor if it becomes resource heavy.

This was the way i figured out so i can have users creating new departments without having to create new collections for each department. Just one collection that describes departments, and one collections for ppl inside departments.
_.uniq() would need to get all from the users collection, right? I tried it, it’s too slow.
A distinct query on a properly indexed collection is not too slow. It seems goods to me. But I can only run it on the server side. I couldn’t make it reactive.

What do you guys mean by subcollections? Or how should i denormalize the data into a new collection?

If this needs to be reactive, you could make use of tunguska:reactive-aggregate (an updated fork of jcbernack:reactive-aggregate). The distinct method can be implemented with the aggregation pipeline (using $group), and you will then get reactive updates.

Note that this package is also only on the server, but using pub/sub is able to deliver reactive aggregation changes to the client.

Can you find me working example of something like this? On the manual or even somewhere else?

@robfallows thank you, i managed to make it work with the example in the repo

1 Like