What is the most effective (your preffered) way to make mongo "leftJoin" in Meteor?

I’ve never seen any problem updating data across thousands of documents.
You can also do a double check, use method call to fetch those data and do another update on the client side.

My strategy: if a page which can be accessed by numerous users, I won’t subscribe for lists of documents. I will load them by using method and make it feel like reactive data by subscribing a single document which holds the list updates status.

For example, you have a list of thousands of documents, you create a document (in other collection, of-course). Every time a document got updated, you update that document, put the document’s ID in a list with updated date. Then on the client side, you subscribe that single document, you will know when you need to call the method to fetch new updated document data. You will feel that your list is reactive, the delay will be less than 100ms if you have a good network connection.

Meteor’s pub/sub performance is very good if publishing just 1 single document.

1 Like

I don’t know, If you correctly understand my question, only just to make it clear. In your app you will be have for example 50 collections and each collection will be have user, createdBy and modifiedBy fields with for example following object:

{
_id: "some id of user",
displayName: "John Smith"
}

Now in users collection I rename “profile.displayName” from “John Smith” to “Tina Smith”, therefore I must rename all occurences of displayName value on all places (fields) in all collections. It means, update it on 150 places x number of documents.

Is it ok for mongo and DB consistency? Is it corrent approach? How to check, if “update” process was successfully done on all collections?

Strategy:

You can also do a double check.

Double check is not solution by me, I think, because If I found issue and start update process again, I cannot be sure, that everything is ok and nothing crashed again. Therefore I must check it and start update again and again. Also you must think about, if I will restart the docker and meteor/mongo will be restarted/crashed during this update.

It is not so simple, as you written, I think. Therefore my question is again, how to effectively make “left join” and if I will choose c) or d) denormalization variants, how to ensure data consistency?

And also, we must think about, that during the “long-time” updating process data are not consistent.

It’s bad approach, totally agree. Then, how you are effectively doying this “magical” “left join” :smiley: and which way do you prefer?.

IMO Denormalisation works well for hierarchical data, which is fine with Meteor and just one of those MongoDB things where we can have multiple layers of embedded documents… but the other kind of denormalisation where you have duplicated data and these sorts of pre-computed reified views… I think that works better with an asynchronous event-based architecture, e.g. an event hits a message bus saying “updateUser”, which is forwarded on to several subscribed microservices which update their own independent data stores. Akin to what you could do with serverless stack.

That could probably be done with Meteor if you’ve got a single database instance (which might make some of the large-scale async architects shudder a bit).

In my case things are a bit weird (I’m using Tanstack Query instead of Meteor pub-sub, but I’m thinking about using jam:pub-sub with the once method-based publication approach). But I generally do denormalisation where I don’t have to worry about duplication (or where I’m happy to cache old values), and then stitch data together on the client if it’s available yet (otherwise using client side reactivity to widen the scope of my subscriptions/query parameters).

Which I guess is sort of N+1-ish but in an unblocking way?

1 Like

In my case, I have currently monolith. From the answers so far, I feel that we are all “dancing” around the mongoDB in a similar way, just a different dance or with different partners (Tanstack Query) :smiley:.

1 Like

With the side note that DDP does not support subfields by itself in the normal publish protocol: Support merging subfields in DDP (or separate multiple subscriptions instead of merging) · Issue #3764 · meteor/meteor · GitHub there are many issues on it closed because of some re-organization but as far as I am aware this is still an unresolved issue.

That’s why we generally opt to not use sub-objects for projects. The nested objects starts giving headaches later.

2 Likes

It is very important information, thanks a lot!

@storyteller …thanks a lot for fix.

There is also grapher or now nova. But I would not recommend it if you already have a well stablished stack for your project as you will need to change a lot of patterns.

But maybe you can get some ideas from it.

1 Like

If you prefer not to denormalize and don’t need a reactive join, I would expect the aggregation with a Meteor.method to be more performant assuming the appropriate indices are in place. Did you try using .explain to see what the stats look like on it?

Generally, I think this is kind of an unsolved problem in Meteor as we’ve seen by the various approaches people are using in this thread. I think that this is largely due to the mismatch with Mongo and Meteor with regard to modeling data — i.e. Mongo wants you to denormalize and embed, Meteor works better with normalized data because of the issue highlighted above.

I explored adding one-time and reactive joins for jam:pub-sub that felt native to Meteor but I dropped it for the time being. I have ideas on ways to approach it that I’d like to explore, hopefully one day.

3 Likes

Nice! If you decide to go this route, would be great to hear how it works out for you.

…grapher, …nova? can you share more details?

Hello @jam, thank you for your answer. Yes, I dont need pub-sub, methods are enough for my use-case. I see, that you correctly understand my “headache”. I’m ballancing between denormalization or aggregation. Indexes are helping, but problem is, if you would like to for example searching on some for example text colum with “contains” (without another filter). Aggregation is working veeery slowly, in this case. But I must make a deeper query execution investigation.

Sure, there is a package called grapher that does joins on collections and makes it very easy, you can have a look at their repo. This package was abandoned by the original developer and they made a new one separated from Meteor called nova. They state that it has Speeds surpassing SQL in various scenarios.

I made a package (grapher-nova) that we use in our new projects that integrates nova in Meteor, but it doesnt support reactivity yet.

Maybe you can find some ideas for your use case here.

1 Like

I followed this thread and I was not clear whether you actually need pub-sub until you mentioned it.

Your aggregations may be slow for 2 reasons: not the right indexes and/or no limits in your joint DB query in the one-to-many or many-to-many relationships.

For Orders and Users, the links would probably be userId (Orders) to _id (Users). _id is indexed by default so this particular aggregation would be lightning fast. This is a one to one relationship. Going the other way around, things change dramatically.

“Give me all the orders of a user”. This is a one-to-many relationship and this is where things might go slow if you don’t have an index for userId (in the Orderes DB) and if you don’t have a limit (and/or pagination). Scanning through millions of orders yes … will be costly.

The way I do it, being inspired by platforms such as Facebook or Telegram - I store a lot of data on the client.
You may ask this in Perplexity or some other connected AI: “What does Facebook store in IndexedDB in the browser, under ArrayBuffers?”. You can also open FB and check the IndexedDB in the browser to see the massive amounts of that they store in the browser itself. Even if you used reactivity for the Orders, your users’ display name and avatar would best be stored in the browser (and persisted) instead of your minimongo.

The next step would be data management in the browser. For instance, if you don’t have a user(s) for an order, fetch the user(s) with a method and add them to the local DB in the browser. You can expire data and re-fetch it - for your avatar/display name updates, once in a while.

MongoDB works great with aggregations when they are done right and when the aggregation is optimised for the size of the MongoDB server used. If you are in Atlas, keep an eye on the dashboards as Atlas will suggest query optimisations where things don’t look great.

There is a point where you might need to leave MongoDB and get a SQL type of db. This point in time might be, for instance, when you need a relationship such as one-to-many-to-many. For example, “give me all the recent posts of my friends”. Not all my friends have new posts, I need to sort the posts not my friends - so I would have to query for something like give me 10 most recent posts where userId is in […5000 ids …]. The more friends I’d have, the more degraded performance.
To make it even worse, I would also want to bring, for each post, the 3 most recent comments :slight_smile: (a one-to-many relationship deep into the one-to-many-to-many relationship in my aggregation).

Just to close my story here, in a one-to-one relationship (Order to User) with MongoDB, it would be really, really hard to get any performance issue.

If you used Grapher, Aggregations, Mongoose before, it comes straightforward to think in relations/data structures, and sometimes a pencil and a piece of paper helps a lot. This is where Meteor Grapher defines the relationships: grapher/lib/links/linkTypes at master · cult-of-coders/grapher · GitHub
It is clear, well-documented but for some reasons, due to the syntax I had to use in the code or for some other reasons, I abandoned Grapher (after I upgraded it for Meteor 3) in favor of direct aggregations. Mongo Compass has a great visualizer for the aggregation builder, and you can also use natural language to start building your queries.

This for sure for common data. For example in our project management tool we by default publish the list of projects and never publish it in other publications.

So a publication like tasks can just take the projects data (it trusts that it is there). It has errors when it is not there, it should be in the client always.

This has also an advantage in rendering. We don’t start rendering before the basics are loaded, our list if for example:

  • Projects
  • Current user
  • Current organization
  • Etc.

Only after that we start working.

…hmmmm, veery interesting. I’m investigating it.

Hello @paulishca, how are you? Thanks a lot for your answer. Yes, I don’t need reactivity, mostly I’m focusing in this thread to mongo data model suitable for “left joining” and performant. Indexes are very important in this point, I totally agree with you. Instead of it, there are existing cases, where indexes does not help, for example:

get orders by user, where displayName contains "John" and sort by user.profile.displayName

…or

get orders by user, where one of user.profile.bankAccounts (array) has field active = true;

This are absolutely standard queries, which you will need to make from datagrid. Store data into the client side is good idea, if you do not have a tons of data, which you must put into the client side, or “private” data. If you are modelling fixed structure, It’s also way. But in my case, I’m searching for “universal” solution, which will be not (or minimal) “sensitive” to concrete use-case.

By your example “one-to-many-to-many” I see, that you totally understand my question and headache. Now I’m not able to tell, in how much times I will fall into this trap, but If I fall, I will need to redesign whole data structure with existing data (it will be biiig refactor), than I must make right decision at the start.

Have to catch up on this thread, but to the denormalize and similar solutions, I think it is a good idea to take the MongoDB Data Modeling Skills for Developers path/course on MongoDB university to get some more ideas and insight into what is used with MongoDB in this regard:

1 Like

Hello @storyteller, how are you? Thanks a lot for your answer. I already done a time before those trainings, but I must say, that there was not so much interesting info. I was quite disappointed. Most of those tutorials are about “if you want read data together, store it together”, but as we know, real life situations are muuuch more complicated :smiley:

If you have also next tips about resources focused to mongo schema modeling, and if is it not a “secret”, I will appreciate if you will share it.