What is the most effective (your preffered) way to make mongo "leftJoin" in Meteor?

What is the most effective way to make mongo “leftJoin” in Meteor? Which way do you prefer?

a) aggregate

By my experience is, it’s slow

Meteor.publish('ordersWithUsers', function () {
  return Orders.rawCollection().aggregate([
    {
      $lookup: {
        from: 'users',
        localField: 'userId',
        foreignField: '_id',
        as: 'userInfo'
      }
    },
    {
      $unwind: {
        path: '$userInfo',
        preserveNullAndEmptyArrays: true
      }
    }
  ]).toArray();
});

b) manually selected

Memory uneffective and risky, if there is a lot of users.

const orders = Orders.find().fetch();
const userIds = [...new Set(orders.map(o => o.userId))];
const users = Users.find({ _id: { $in: userIds } }).fetch();

const enrichedOrders = orders.map(order => {
  const user = users.find(u => u._id === order.userId);
  return {
    ...order,
    user: user || null,
  };
});

c) denormalization

Risky for “unactual” data, fastest, but redundant data:

Orders.insert({
  product: 'Coffee',
  userId: 'abc123', // ID to identify user
  user: { // User object data
      userName: 'John Smith'
}
});

d) denormalization 2

Risky for “unactual” data, fastest, but redundant data,
on the other side, I like, when I do not have userId and then also user object BUT:
disadvantage is, if I need to filter by userId, I must filter by user._id

Orders.insert({
  product: 'Coffee',
  user: { // Same as previous, but without userId field (_id is used instead)
      _id: 'abc123'
      userName: 'John Smith'
}
});

e) something else?

Very important part of this question is also, if I would like to make a selection by leftJoin of leftJoined collection, for example: “book”->“book_categories”->“book_category_types” (get all books from collection “books” where “book_category_types” creationDate is in current year). What to do in this case? Denormalization? Aggregate through multiple collection (slow)? What is also important to tell is, that all selected data I’m using in datagrid, therefore in selection process, I need to use pagination, filtering and sorting.

Thanks a lot for and each recommendation or experience.

2 Likes

I almost always use " b) manually selected" and add a fields specifier to only query what is necessary.

1 Like

Thanks a lot for your experience! And do you prefer structure

{userId: “someid”
user: {…user data object } // added if user found
}

…or another way: just a user object (without userId and instead of userId is indexed user._id and another fields are dynamically added)

{user:
{_id: "someid " // always there
“firstname”, “lastname”, “displayName” // dynamically added from selected user object
}}

Personally I have no clear preference for structuring the user object, both ways seem fine to me.

1 Like

If I can avoid it, I will not publish an aggregation.

If I need to publish, I usually publish on the main collection requiring “real-time” data (e.g. Orders in your example) and then enrich the data using methods (e.g. fetch users from the subscribed orders).

1 Like

I usually pick option c or d. Most of my apps are high read.

1 Like

Just to make my question more clear, I’m focusing to read data for datagrid to be able to use pagination, filtering, sorting, etc (on server side). Publish “left joined” collection data to make it available on client side is very comfortable way, but isn’t is risky, if those collections will be have a thowsands of documents?

Hello @minhna, thanks you for your response, I’m “little bit scared”, about consistency, then I will be have thowsands of documents in “main” collection with denormalized data and then I will change something, where I will need to update all denormalized data in all collections. For example, when I will change user’s displayName, I will need to update all collections “user” fields, “createdAt” field, “modifiedAt” field, etc… And If something broke during that “long” process, db will start to be inconsistent. This is the main reason, why I’m hesitant to use this option, even though I know it’s commonly used.

I personally will not use publication because it is comfortable/convenient to use. I will use it only because there is a use-case of real-time data for my users.

If there is no use-case for real-time data, publication is the wrong tool outside of prototyping.

1 Like

I’ve never seen any problem updating data across thousands of documents.
You can also do a double check, use method call to fetch those data and do another update on the client side.

My strategy: if a page which can be accessed by numerous users, I won’t subscribe for lists of documents. I will load them by using method and make it feel like reactive data by subscribing a single document which holds the list updates status.

For example, you have a list of thousands of documents, you create a document (in other collection, of-course). Every time a document got updated, you update that document, put the document’s ID in a list with updated date. Then on the client side, you subscribe that single document, you will know when you need to call the method to fetch new updated document data. You will feel that your list is reactive, the delay will be less than 100ms if you have a good network connection.

Meteor’s pub/sub performance is very good if publishing just 1 single document.

I don’t know, If you correctly understand my question, only just to make it clear. In your app you will be have for example 50 collections and each collection will be have user, createdBy and modifiedBy fields with for example following object:

{
_id: "some id of user",
displayName: "John Smith"
}

Now in users collection I rename “profile.displayName” from “John Smith” to “Tina Smith”, therefore I must rename all occurences of displayName value on all places (fields) in all collections. It means, update it on 150 places x number of documents.

Is it ok for mongo and DB consistency? Is it corrent approach? How to check, if “update” process was successfully done on all collections?

Strategy:

You can also do a double check.

Double check is not solution by me, I think, because If I found issue and start update process again, I cannot be sure, that everything is ok and nothing crashed again. Therefore I must check it and start update again and again. Also you must think about, if I will restart the docker and meteor/mongo will be restarted/crashed during this update.

It is not so simple, as you written, I think. Therefore my question is again, how to effectively make “left join” and if I will choose c) or d) denormalization variants, how to ensure data consistency?

And also, we must think about, that during the “long-time” updating process data are not consistent.

It’s bad approach, totally agree. Then, how you are effectively doying this “magical” “left join” :smiley: and which way do you prefer?.

IMO Denormalisation works well for hierarchical data, which is fine with Meteor and just one of those MongoDB things where we can have multiple layers of embedded documents… but the other kind of denormalisation where you have duplicated data and these sorts of pre-computed reified views… I think that works better with an asynchronous event-based architecture, e.g. an event hits a message bus saying “updateUser”, which is forwarded on to several subscribed microservices which update their own independent data stores. Akin to what you could do with serverless stack.

That could probably be done with Meteor if you’ve got a single database instance (which might make some of the large-scale async architects shudder a bit).

In my case things are a bit weird (I’m using Tanstack Query instead of Meteor pub-sub, but I’m thinking about using jam:pub-sub with the once method-based publication approach). But I generally do denormalisation where I don’t have to worry about duplication (or where I’m happy to cache old values), and then stitch data together on the client if it’s available yet (otherwise using client side reactivity to widen the scope of my subscriptions/query parameters).

Which I guess is sort of N+1-ish but in an unblocking way?

1 Like

In my case, I have currently monolith. From the answers so far, I feel that we are all “dancing” around the mongoDB in a similar way, just a different dance or with different partners (Tanstack Query) :smiley:.

1 Like

With the side note that DDP does not support subfields by itself in the normal publish protocol: Support merging subfields in DDP (or separate multiple subscriptions instead of merging) · Issue #3764 · meteor/meteor · GitHub there are many issues on it closed because of some re-organization but as far as I am aware this is still an unresolved issue.

That’s why we generally opt to not use sub-objects for projects. The nested objects starts giving headaches later.

2 Likes

It is very important information, thanks a lot!

@storyteller …thanks a lot for fix.

There is also grapher or now nova. But I would not recommend it if you already have a well stablished stack for your project as you will need to change a lot of patterns.

But maybe you can get some ideas from it.

1 Like

If you prefer not to denormalize and don’t need a reactive join, I would expect the aggregation with a Meteor.method to be more performant assuming the appropriate indices are in place. Did you try using .explain to see what the stats look like on it?

Generally, I think this is kind of an unsolved problem in Meteor as we’ve seen by the various approaches people are using in this thread. I think that this is largely due to the mismatch with Mongo and Meteor with regard to modeling data — i.e. Mongo wants you to denormalize and embed, Meteor works better with normalized data because of the issue highlighted above.

I explored adding one-time and reactive joins for jam:pub-sub that felt native to Meteor but I dropped it for the time being. I have ideas on ways to approach it that I’d like to explore, hopefully one day.

3 Likes

Nice! If you decide to go this route, would be great to hear how it works out for you.