What is the most effective (your preffered) way to make mongo "leftJoin" in Meteor?

klaucode · April 15, 2025, 7:43pm

…grapher, …nova? can you share more details?

klaucode · April 15, 2025, 7:49pm

Hello @jam, thank you for your answer. Yes, I dont need pub-sub, methods are enough for my use-case. I see, that you correctly understand my “headache”. I’m ballancing between denormalization or aggregation. Indexes are helping, but problem is, if you would like to for example searching on some for example text colum with “contains” (without another filter). Aggregation is working veeery slowly, in this case. But I must make a deeper query execution investigation.

pmogollon · April 15, 2025, 7:51pm

Sure, there is a package called grapher that does joins on collections and makes it very easy, you can have a look at their repo. This package was abandoned by the original developer and they made a new one separated from Meteor called nova. They state that it has Speeds surpassing SQL in various scenarios.

I made a package (grapher-nova) that we use in our new projects that integrates nova in Meteor, but it doesnt support reactivity yet.

Maybe you can find some ideas for your use case here.

paulishca · April 16, 2025, 5:26am

I followed this thread and I was not clear whether you actually need pub-sub until you mentioned it.

Your aggregations may be slow for 2 reasons: not the right indexes and/or no limits in your joint DB query in the one-to-many or many-to-many relationships.

For Orders and Users, the links would probably be userId (Orders) to _id (Users). _id is indexed by default so this particular aggregation would be lightning fast. This is a one to one relationship. Going the other way around, things change dramatically.

“Give me all the orders of a user”. This is a one-to-many relationship and this is where things might go slow if you don’t have an index for userId (in the Orderes DB) and if you don’t have a limit (and/or pagination). Scanning through millions of orders yes … will be costly.

The way I do it, being inspired by platforms such as Facebook or Telegram - I store a lot of data on the client.
You may ask this in Perplexity or some other connected AI: “What does Facebook store in IndexedDB in the browser, under ArrayBuffers?”. You can also open FB and check the IndexedDB in the browser to see the massive amounts of that they store in the browser itself. Even if you used reactivity for the Orders, your users’ display name and avatar would best be stored in the browser (and persisted) instead of your minimongo.

The next step would be data management in the browser. For instance, if you don’t have a user(s) for an order, fetch the user(s) with a method and add them to the local DB in the browser. You can expire data and re-fetch it - for your avatar/display name updates, once in a while.

MongoDB works great with aggregations when they are done right and when the aggregation is optimised for the size of the MongoDB server used. If you are in Atlas, keep an eye on the dashboards as Atlas will suggest query optimisations where things don’t look great.

There is a point where you might need to leave MongoDB and get a SQL type of db. This point in time might be, for instance, when you need a relationship such as one-to-many-to-many. For example, “give me all the recent posts of my friends”. Not all my friends have new posts, I need to sort the posts not my friends - so I would have to query for something like give me 10 most recent posts where userId is in […5000 ids …]. The more friends I’d have, the more degraded performance.
To make it even worse, I would also want to bring, for each post, the 3 most recent comments (a one-to-many relationship deep into the one-to-many-to-many relationship in my aggregation).

Just to close my story here, in a one-to-one relationship (Order to User) with MongoDB, it would be really, really hard to get any performance issue.

If you used Grapher, Aggregations, Mongoose before, it comes straightforward to think in relations/data structures, and sometimes a pencil and a piece of paper helps a lot. This is where Meteor Grapher defines the relationships: grapher/lib/links/linkTypes at master · cult-of-coders/grapher · GitHub
It is clear, well-documented but for some reasons, due to the syntax I had to use in the code or for some other reasons, I abandoned Grapher (after I upgraded it for Meteor 3) in favor of direct aggregations. Mongo Compass has a great visualizer for the aggregation builder, and you can also use natural language to start building your queries.

lucfranken · April 16, 2025, 6:18am

This for sure for common data. For example in our project management tool we by default publish the list of projects and never publish it in other publications.

So a publication like tasks can just take the projects data (it trusts that it is there). It has errors when it is not there, it should be in the client always.

This has also an advantage in rendering. We don’t start rendering before the basics are loaded, our list if for example:

Projects
Current user
Current organization
Etc.

Only after that we start working.

klaucode · April 16, 2025, 7:15am

…hmmmm, veery interesting. I’m investigating it.

klaucode · April 16, 2025, 7:44am

Hello @paulishca, how are you? Thanks a lot for your answer. Yes, I don’t need reactivity, mostly I’m focusing in this thread to mongo data model suitable for “left joining” and performant. Indexes are very important in this point, I totally agree with you. Instead of it, there are existing cases, where indexes does not help, for example:

get orders by user, where displayName contains "John" and sort by user.profile.displayName

…or

get orders by user, where one of user.profile.bankAccounts (array) has field active = true;

This are absolutely standard queries, which you will need to make from datagrid. Store data into the client side is good idea, if you do not have a tons of data, which you must put into the client side, or “private” data. If you are modelling fixed structure, It’s also way. But in my case, I’m searching for “universal” solution, which will be not (or minimal) “sensitive” to concrete use-case.

By your example “one-to-many-to-many” I see, that you totally understand my question and headache. Now I’m not able to tell, in how much times I will fall into this trap, but If I fall, I will need to redesign whole data structure with existing data (it will be biiig refactor), than I must make right decision at the start.

storyteller · April 16, 2025, 9:26am

Have to catch up on this thread, but to the denormalize and similar solutions, I think it is a good idea to take the MongoDB Data Modeling Skills for Developers path/course on MongoDB university to get some more ideas and insight into what is used with MongoDB in this regard:

klaucode · April 16, 2025, 9:34am

Hello @storyteller, how are you? Thanks a lot for your answer. I already done a time before those trainings, but I must say, that there was not so much interesting info. I was quite disappointed. Most of those tutorials are about “if you want read data together, store it together”, but as we know, real life situations are muuuch more complicated

If you have also next tips about resources focused to mongo schema modeling, and if is it not a “secret”, I will appreciate if you will share it.

storyteller · April 16, 2025, 9:44am

That is why I was surprised by this new one as I learned some new interesting patterns that I didn’t know about. Not all of them are applicable to the problem or to Meteor and some which I don’t like out of aesthetics. Also in the latter half they go more into the technical optimization and limits that I have not seen before. Just putting it out here as another resource for people.

paulishca · April 16, 2025, 12:09pm

There are APIs which deal very well with IndexedDB. I use Redux for React with the persistence plugin for IndexedDB. This means that I can split between in-memory data which gets erased when leaving the session and permanent data.

The opposite is true. You should use IndexedDB if you have tons of data including media files.

From Perplexity

Securing Data in IndexedDB

IndexedDB is a powerful browser-based storage solution, but it is not inherently private or encrypted. By default, IndexedDB data is protected only by the browser’s same-origin policy, which prevents access from other domains but does not protect against local attacks, malware, or anyone with access to the user’s device[1][2][3]. Sensitive data stored in IndexedDB can be exposed if the user’s machine is compromised or if malicious scripts run within the same origin[3][6].

Risks and Limitations

Same-origin Policy: Only scripts from the same domain can access the data, but this does not protect against threats from within the same origin or local device access[1][2].
Exposure to Malware: Any malware or malicious browser extension running on the user’s system can potentially access IndexedDB data[3].
No Native Encryption: IndexedDB does not provide built-in encryption or privacy controls for stored data[3][6].

Making IndexedDB Data Private

To truly secure sensitive data in IndexedDB, you must encrypt it before storing. This ensures that even if the data is accessed outside your application, it remains unreadable without the decryption key.

Using ArrayBuffer for Encryption

ArrayBuffer is a binary data type in JavaScript that can be used to handle encrypted data efficiently.
You can encrypt your data (e.g., using the Web Crypto API with AES-GCM) and store the resulting ciphertext as an ArrayBuffer in IndexedDB.
When you need the data, retrieve the ArrayBuffer, decrypt it, and convert it back to the original format.

Example Workflow

Convert Data to ArrayBuffer: Use TextEncoder to convert strings to ArrayBuffer.
Encrypt Data: Use the Web Crypto API to encrypt the ArrayBuffer, producing another ArrayBuffer (the ciphertext).
Store Encrypted ArrayBuffer: Save the ciphertext ArrayBuffer in IndexedDB.
Retrieve and Decrypt: When needed, fetch the ciphertext ArrayBuffer from IndexedDB and decrypt it using the Web Crypto API.
Convert Back to Original: Use TextDecoder to convert the decrypted ArrayBuffer back to a string or object.

Sample Code Outline

// Convert string to ArrayBuffer
const encoder = new TextEncoder();
const dataBuffer = encoder.encode("Sensitive Data").buffer;

// Encrypt using Web Crypto API
const key = await crypto.subtle.generateKey(
  { name: "AES-GCM", length: 256 },
  true,
  ["encrypt", "decrypt"]
);
const iv = crypto.getRandomValues(new Uint8Array(12));
const encrypted = await crypto.subtle.encrypt(
  { name: "AES-GCM", iv },
  key,
  dataBuffer
);

// Store { encrypted, iv } in IndexedDB
// ... (IndexedDB put operation)

// To retrieve and decrypt:
const decrypted = await crypto.subtle.decrypt(
  { name: "AES-GCM", iv },
  key,
  encrypted
);
const decoder = new TextDecoder();
const originalText = decoder.decode(decrypted);

This approach ensures that only your application (with the decryption key) can read the sensitive data, even if the raw IndexedDB storage is accessed[10].

Summary Table

Method	Security Level	Notes
Plaintext storage	Low	Vulnerable to local attacks, malware
Encrypted ArrayBuffer	High (with good key management)	Data unreadable without decryption key

Key Points

IndexedDB alone does not provide data privacy beyond same-origin restrictions[1][2][3].
Use encryption (e.g., AES-GCM via Web Crypto API) to secure sensitive data before storing it in IndexedDB[10].
ArrayBuffer is suitable for storing encrypted binary data in IndexedDB[7][8].
Protect and manage your encryption keys securely—never store them in the same place as your encrypted data.

This approach significantly enhances the privacy and security of data stored in IndexedDB.

Sources
[1] Using IndexedDB - Web APIs | MDN Using IndexedDB - Web APIs | MDN
[2] IndexedDB API - MDN Web Docs - Mozilla IndexedDB API - Web APIs | MDN
[3] Local IndexedDB in Browser - is it really a security risk? sensitive data exposure - Local IndexedDB in Browser - is it really a security risk? - Information Security Stack Exchange
[4] Work with IndexedDB | Articles - web.dev Work with IndexedDB | Articles | web.dev
[5] How to Store Unlimited* Data in the Browser with IndexedDB How to Store Unlimited* Data in the Browser with IndexedDB — SitePoint
[6] Store sensitive data in indexedDB? - Stack Overflow google chrome - Store sensitive data in indexedDB? - Stack Overflow
[7] Saving ArrayBuffer in IndexedDB - Stack Overflow html - Saving ArrayBuffer in IndexedDB - Stack Overflow
[8] Save Arraybuffer from IndexedDB entry - Stack Overflow html - Save Arraybuffer from IndexedDB entry - Stack Overflow
[9] The pain and anguish of using IndexedDB: problems, bugs and … The pain and anguish of using IndexedDB: problems, bugs and oddities · GitHub
[10] Simple Way to Store Encrypted Data in IndexedDB - GitHub Gist Simple Way to Store Encrypted Data in IndexedDB · GitHub

awatson1978 · April 17, 2025, 1:45am

Yes, I agree with paulishca, re: Aggregation pipeline. You basically want the $lookup operator.

When possible, it’s preferable to do the left-outer-join at data write or update time. Doing it at data read, as a just-in-time processing step, is a good way to cause yourself all sorts of bottlenecks.

There should be a collection on the server that mirrors what the client will receive; like the spokes on a bicycle chain. The aggregation function, in turn, feeds into that master spoke-and-chain conveyer belt, that’s sole purpose is synchronizing data between the Mongo cursor on the server, and the minimongo cursor on the client.

So, you don’t want to mess with that conveyor belt. You do want to load stuff onto it. And aggregation pipeline is built for that.

If you can’t recalculate the leftJoin at record update/write, then try to run a cron job and run the leftJoin operation across the entire collection 1x per hour, or 1x per day. The idea is that, to the best extent possible, prepare the data ahead of time. Figure out your problem domain, and the questions that users will be asking.

If you want just-in-time exploratory analytics, ditch Mongo and install Postgres. In most other situations, pre-calc your over-the-wire pub/sub data; so it streams as fast as the server can queue up the bits and the pipe can handle.

awatson1978 · April 17, 2025, 2:17am

Yeah, this is borked. Don’t do this.

Meteor.publish('ordersWithUsers', function () {
  return Orders.rawCollection().aggregate([
    {
      $lookup: {
        from: 'users',
        localField: 'userId',
        foreignField: '_id',
        as: 'userInfo'
      }
    },
    {
      $unwind: {
        path: '$userInfo',
        preserveNullAndEmptyArrays: true
      }
    }
  ]).toArray();
});

You’ll have more success with a calcLeftJoin() function that you sprinkle on important events in your workflow… record update, browser open, user account updated, 1x per day, or whatever your business logic is. That allows the slow process the time it needs to do its aggregation and calculations; and then you pick the results up after it’s completed.

But instead of writing out to a variable and tying up the entire thread as you’re waiting, you use the database (which is multithreaded) to hand off to another process, which will take care of nicely writing to the collection of your choice as an output. And that gets autosynchronized with the client via a fairly generic pub/sub.

Meteor.publish('OrdersWithUsers')

async function calcLeftJoin(){
  return Orders.rawCollection().aggregate([
    {
      $lookup: {
        from: 'users',
        localField: 'userId',
        foreignField: '_id',
        as: 'userInfo'
      }
    },
    {
      $unwind: {
        path: '$userInfo',
        preserveNullAndEmptyArrays: true
      }
    },
    {
        $merge: {
          into: "OrdersWithUsers",
          on: "_id",              
          whenMatched: "merge",   
          whenNotMatched: "insert"
        }
      }
  ]);
}

tl;dr - Use the database’s multithreaded-ness when running the aggregation pipeline. Don’t over-complicate the pub/sub.

lucfranken · April 17, 2025, 6:48am

Ooh, love the last one: directly merging into a new collection, never seen that one!

klaucode · April 17, 2025, 1:03pm

Yes, I totally agree with you and thanks a lot for information. If you will get some new resources about mongo patterns and anti-patterns (and if you can), share it with us please.

klaucode · April 17, 2025, 1:07pm

Thanks a lot for your experience with IndexedDB. I didn’t used it till now, but It can be also solution for application “offline mode”. …Store data into IndexedDB and later sync with mongo, what do you think? Are you using it also for this use-case?

klaucode · April 17, 2025, 1:13pm

What I not written yet is, that I would like to dynamically changing document structure, and this “dance floor” is better for mongo I think (some use-cases I don’t have an idea, how to do it in postgres). …and some in mongo But you gived me a new idea, I must think and investiage it and I will let you know my decision.

klaucode · April 17, 2025, 1:19pm

…interesting, I will play with this new “hummer”! It’s exactly appoach which I like. I would like to move the database operations into the database server and minimize reading-writing operations in the app.