Caching meteor methods' DB access

Some of our meteor methods need a lot of DB access to check permissions on each of the objects that are returned. We came across this article that explains how to create a memoizer for Meteor:

What I’m thinking could be really nice (and might even be a public meteor package), is to create a new local memoizer for each meteor method call, that stores each DB result, so that you can access it as many times as you want, over the course of execution of a method.

And when a method ends, the cache is emptied.

It’s a kind of in-memory MiniMongo for the duration of a method:

Meteor.methods({
  fetchTodos() {
    this.cacher = new Cache();
    const todos = Todos.find({ userId: this.userId }).fetch();

    const allowedTodos = todos.filter(({ listId, ...todo }) => {
      const permissions = TodoLists.findOne({ _id: listId }); // Each identical todo list will be fetched once
      const user = Users.findOne({ _id: this.userId }); // Will only be fetched once

      return isAllowedToSeeTodos(permissions, user, todo);
    });

    this.cacher.destroy();
    return allowedTodos;
  },
});

This is just an example, and it’s sloppy, but you could write this, and it would only really fetch your user once, as well as any todolist once if it is fetched multiple times. Because sometimes, fetching all of your data at the beginning once and passing it down can result in a horrible code.

There are a few things to think about:

  • Invalidating cache if you update something in the middle of a method and then try to refetch it. (the old version of the document will come back from the cache)
  • Can the cache be invisible to the rest of your code? i.e. can you just write Users.find().fetch() and it will know to look for any existing cache before going to the DB? To make this work elegantly it’d be a must have

Has anyone implemented something like this?

4 Likes

This almost looks like its api is eventually going to look like pub/sub :slight_smile: I mean, in plain English terms, this is very similar to what you’re describing.

The subtle difference would be the lack of server state that comes with the pub/sub paradigm. That state is indeed the cache invalidation mechanism you’re asking about, albeit too aggressive.

Nonetheless, the ability to destroy is also quite analogous to breaking down the subscription.

So I guess what you want is a pub/sub where there’s no server state and the client only polls rather than listen.

Having made the arguement, should it not be more preferable to invert the whole spec you’re describing and enhance pub/sub in a way to allow for stateless adhoc request of [semi-live]data.

Edit: perhaps some of the subscription caching mechanism explored earlier in the history of Meteor can provide some insight as to how this could work and perform.

1 Like

Ah, I hadn’t thought of it like that!

So if we were to reuse some of meteor’s pub/sub system, as an example, it could allow both of these queries to hit the same cache:

Users.find({ _id: 'someId' })
Users.find({ _id: { $in: ['someId'] }})

However this might eat into the performance gains, as the overhead to find a matching cache is higher in this case. Maybe the caching mechanism could be chosen, or changed with plugins.

I was thinking of storing the caches based only on collection and exact query, to keep it simple.

The first challenge is to change all the collections to try and hit a cache when ran in the same “meteor method context”.

I should look into how Kadira does this, they know that each find/fetch is done in the context of a single meteor method to track analytics and execution time.

1 Like

I thought this was more or less the main problem that redis oplog solves bu using redis as the signaling mechanism to help keep track of what’s current/fresh.

As for Kadira, you might want to go down this rabbit hole https://github.com/meteorhacks/kadira/blob/master/lib/hijack/wrap_session.js which is just a guess, really.

1 Like

We tested the approach described by Theodor in the grapher docs and it worked well. We don’t otherwise use grapher though so we only used it in one case to get a user’s permissions which is the most requested data in our app.

2 Likes

Ah yes, so @florianbienefelt I belive a combination of grapher and redis oplog from more or less achieves your desired outcome perhaps.

I have used neither, but they look like they have a “cult following”, pun intended :wink:

4 Likes

The issue with grapher caching is that it only works for named queries, we want to be able to run any Mongo method (with or without grapher) to be cached for the duration of a method!

And yes, regarding kadira, this module does some very interesting things to the DB that we want to reuse:

Florian, I think you could use this elegantly: https://github.com/graphql/dataloader.
In conjunction with a pre_method or after_method that clears it. This is the solution facebook came up with for doing these sort of caches in a GraphQL query.

Hackish: https://github.com/leonidez/method-hooks
Or if you still use: https://github.com/cult-of-coders/mutations then it’s even easier.

That’s what I would do if I were in your place.

1 Like

Yep, still using cultofcoders:mutations, great package!

DataLoader seems to be exactly what I’m looking for, but it seems like I’d have to rewrite all my backend code to use the DataLoader class, which is a bit of a pain. And DataLoader has a promise interface only, whereas all regular meteor backend code is “synchronous”. I could write a nice wrapper that uses fibers, but I’m unsure about the other issues this could create.

Do you have an easy migration path with a clean pattern in mind, that I can’t think of?

EDIT: To add a bit more clarity, I see a way to do this in a way that is like magic, where all your code remains the same, but boom, every 2nd query to the same data is cached!

This article that explains how to use DataLoader with MongoDB and sift is great inspiration!

1 Like

My approach would be different. I would first try to identify what is expensive for me, and slowly start using the Data Loader for those, honestly. I don’t care much about performance in general, I’d rather pay servers than human time, they’re much cheaper. :smiley:

DataLoader and Sift can be a solution for sure. But for what you are requesting, it’s very hard to do magic stuff, you’ll head into things like you query something that has extra fields, or you would want to query something that is absolutely fresh from db, headaches after headaches. Use DataLoader and Meteor.wrapAsync it shouldn’t be a problem for what you need.

1 Like

@florianbienefelt dataloader has been built with a stateless usecase in mind where the cache lives only during the lifecycle of a single graphql query request, in order to solve a specific problem around how graphql opens up the possibility to request the same data live over and over.

Furthermore, the whole paradigm is server side whereas this whole time, what you’re describing to me sounded like you need this both on the server and on the client. But then again, this time it begins to sound like graphql and apollo client with built in client side caching coupled with dataloader to solve the server side. And even on top of that, you can further push the limits with a redis cache to sever the bond with the database on subsequent queries.

Nonetheless, magical caching sounds quite like a can of worms to me. Do you have specific use cases where defaulting to a cache-first mechanism is indeed what you need?

Edit: what’s bothering you about the promise interface or dataloader? you should be able to await them and they would align perfectly with the rest of your sync-style code.

Well no, I was talking about server-side only, and for the duration of one single meteor method (a single lifecycle like you say), so DataLoader is exactly what I want.

The specific use case is exactly what I wrote above, a method that checks permissions hundreds of times during a single run.

So the promise based interface annoys me because Ideally I don’t want to rewrite all my backend code to use async await and pollute it with a DataLoader class.

What I’m thinking about right now, is that I’ll cache only findOne requests that have a _id (and maybe $in with a single array item) as their selector, which is the vast majority of our DB requests.

Also, the DataLoader class is really simple, so I might just use it as inspiration!

And to avoid making lots of things fail unexpectedly, this could be opt-in on a per-method basis.

Hmm, so if everything is happening inside a method and on the server side, and you’re fine with opt-in, why not try aggregations?

Dataloader was required because a generic graphql server has no easy way of knowing what combinations of queries would exist within any one request, hence rapid firing of queries for the same object.

People developing for rest endpoints have typically been optimizing this on their queries.

Methods are more analogous to rest endpoints in this case.

So why not try aggregations to minimize the number of roundtrips to the database server and allow it to do what it is best at. Meanwhile, saving precious memory footprint on your app server because this type of caching limits your vertical scalability a lot.

What do you think?

What I like to do is to have each application server observe (not observe changes) a collection and populate a cache.

If it’s a big collection, some LRU logic may be useful. (The individual caches on each application server will then diverge.)

This is useful for collections with far more reads than writes, of course.

2 Likes

I don’t see how you could write an aggregation for the sample query I provided above, did you have something in mind?

Also, if you only cache things on a per-method basis, and always empty that cache when the method is over, how would it impact your memory?

Well here it is: epotek:method-cache.

It currently caches all finds that use only _id, and invalidates cache very liberally based on updates.

It works well in my tests, but I can’t get it to work with more complex queries such as when using grapher.

Result: In a method that fetches the same document 1000 times, you get a 10x performance increase.

4 Likes

Why not just use a mongo document that contains only what the user needs? mongodb is, in many sense, a cache. If you have to be aware when things are mutated anyway, why not just bake off a document that contains, e.g., all the user’s “allowed” todos in a single document?

Well because permissions are more complex than just saying yes or no. Our permissions system lets you decide on a very granular level what a user can and cannot do, such as update, remove, linking it with other DB documents, seeing more or less of the data.

I find DB query caching like DataLoader to be a very elegant solution, I just have to find the right way to make it work seamlessly with all our other packages.

As an initial test, one of our queries fetches roughly 1000 documents, with a 50% cache hit rate, so that’s pretty promising.

2 Likes

Hello,
This is a very interested thread. We are also querying the DB to check the rights of users in a very intensive way. We decided to try caching this. We ended up with something simple. In case someone need it :

On the server :

let cache = {};
RKCache = {
    set: function(key, value) {
        if (Meteor.isDevelopment) console.log('caching > key : ' + key + ' -> ', value);
        cache[key] = {value, expires: new Date().getTime() + 5*1000}; // 5 seconds expiration time
    },
    get: function(key) {
        if (typeof cache[key] === 'undefined') return;
        if (typeof cache[key].value === 'undefined') return;
        if (cache[key].expires < new Date().getTime()) {
            if (Meteor.isDevelopment) console.log('Cache expired');
            delete cache[key];
            return;
        }
        if (Meteor.isDevelopment) console.log('using cached value : ', cache[key].value);
        return cache[key].value;
    },
};

RKCache is a global object available everywhere in our app
Now we can use it on the server in any methods or functions :

const getIdsThatUserCanAccess = function (userId) {
  if (cache = RKCache.get(userId + '-getIdsThatUserCanAccess')) return cache;
  const query = {
   // complex query that takes time to perform but not change very often
  }
  const res = Docs.find(query, {fields: {_id: 1}}).map(function (el) {
    return el._id;
  });
  RKCache.set(userId + '-getIdsThatUserCanAccess', res);
  return res;
};
1 Like