Questions About Meteor.defer() On The Server

evolross · September 22, 2020, 1:34am

Years into Meteor development and I’m still not totally clear on what all Meteor.defer() does and offers when used on the server.

If I understand correctly, it can be used to defer a block of code to run later, asynchronously, for performance optimization. And whatever is in that block of code can’t return a value without using a Promise async/wait which would defeat the purpose of deferring it to begin with as it would then be synchronous.

So… here’s an example of an issue I have and trying to understand if/how Meteor.defer() will help:

I have an app that has lots and lots of user critical methods firing constantly. They need to return as quickly as possible to the client. The “pressure” or frequency of these calls varies, but they are continuous user interactions always firing.
There’s another method called cacheHugeDataSet that any client can call that causes a lot of Mongo hits and processing loops on the server. This does not need to complete quickly or return a value directly to the client. It can be delayed.
The problem is if cacheHugeDataSet gets called in the middle of #1, all the calls happening in #1 get delayed significantly. I can see it happening on Meteor APM.

So my questions are the following:

Will using Meteor.defer() in the method code for cacheHugeDataSet make it magically run once most/all of the calls in #1 finish? Or the pressure/frequency of the calls lowers? What’s the threshold for when the server allows deferred code to actually run? I don’t understand the Node.js event loop logic that powers what Meteor.defer() actually does.
When the cacheHugeDataSet heavy code inside the Meteor.defer() actually runs, does it then block everything until it’s done? Like once it’s actually allowed to run does it then take over? Or does it kindly step back into the background if “higher priority” non-deferred methods get called?
If not, does it make sense/is it possible to break large chunks of heavy code into multiple Meteor.defer() blocks so they don’t block as much once they finally run?

I’m basically trying to handle some heavier duty code that does not need to return to the client directly (there’s other ways to do that with ready events fired back to the client that then call cache fetcher methods) with as minimal impact as possible without breaking it out into a micro-service running on its own server. That’s kind of overkill for a one or two heavy functions.

Without deferring it, I’m finding it has a huge impact on all the other high frequency, quick returning method calls that all the other clients are calling. Like everything runs fast, then a cacheHugeDataSet fires, and everything chokes for a few seconds, then returns to normal.

veered · September 22, 2020, 2:34am

Meteor.defer(f) is pretty much the same as setTimeout(f, 0) except that f is wrapped in a fiber. It is even more similar to Meteor.setTimeout(f, 0), since Meteor.setTimeout also wraps the callback in a fiber.

Here is a possible implementation:

import Fibers from 'fibers';

Meteor.defer = function(f) {
  setTimeout(() => {
    Fibers(f).run();
  }, 0);
};

This isn’t precisely what Meteor does: Meteor uses setImmediate instead of setTimeout, injects error handling logic, attaches metadata to the spawned fiber, and some other stuff. But from a usage perspective, you can pretty much just think of Meteor.defer(f) as short-hand for Meteor.setTimeout(f, 0).

Have you called this.unblock() inside of the cacheHugeDataSet Meteor method? If not, it’s worth a shot. If that Meteor method is making a lot of database calls it will be yielding its fiber a lot, so other things will have a chance to run as long as the method is unblocked. Putting the heavy code in a Meteor.defer would have a similar effect because subsequent method calls wouldn’t have to wait long for cacheHugeDataSet to complete (since all the real work is being deferred).

However, keep in mind that if you’ve unblocked the method or deferred execution then you could have overlapping, simultaneous calls to cacheHugeDataSet. So be careful!

No matter what, if cacheHugeDataSet is hogging the CPU stuff is going to slow down. If you want, you could put sleeps (using Meteor._sleepForMs(ms)) in the loop to spread out execution over time.

I sometimes use this function to do something similar:

let throttle = function(f, allocation=.5) {
  if (allocation <= 0 || allocation > 1) {
    throw new Error(`CPU allocation ${allocation * 100}% is invalid. Must be in the range (0, 1].`);
  }

  return function() {
    let tic = new Date(),
        val = f.apply(this, arguments),
        toc = new Date(),
        duration = toc - tic,
        delay = duration * (1 - allocation) / allocation
    ;

    // Delay until we are sure we aren't using more CPU time
    // than we are permitted.
    if (delay > 0) {
      Meteor.wrapAsync(done => Meteor.setTimeout(done, delay))();
    }
    return val;
  };
};

Calling throttle(f, allocation) will return a function that calls f and then sleeps for some percentage of the amount of time f took to run. For example, if allocation is .5 and calling f takes 100ms, then it’d call f and then sleep for an additional 100ms after f has returned. You might be able to use this approach inside some of the loops in cacheHugeDataSet.

wildhart · September 22, 2020, 3:00am

I make regular use of Meteor.defer(Meteor.bindEnvironment(() => {....})). Some of my methods perform a quick db update and then trigger an email to another user. I wrap the Email.send() within the above so the user doesn’t have to wait for the email to be sent.

Another strategy I use to throttle CPU/DB intensive server methods is to break it into chunks and make the method only execute one chunk at a time, then after each chunk is complete the method returns a status object to the client with a progress counter and nextChunkIndex. Then I make the client wait for a 100 ms or so and call the same method with the nextChunkIndex. A benefit of this is that the client can then display a % progress to the user.

I have a couple of export functions in my app, and rather than fetch all data within the past x months, the client fetches 2 days of logs at a time, displaying a progress counter to the user, and then assembles all the data and generates the export file client-side.

Even if it’s a little slower, users like a progress indicator rather than an indeterminate spinner. Plus this gives plenty of opportunity for other clients to call quick methods.

Not all operations can be easily chunked though.

evolross · September 22, 2020, 5:06am

Copy all. Good stuff. Some definite ideas here.

this.unblock() only helps unblock methods call by the same client right? That’s not the issue here. The issue is cacheHugeDataSet causing the wait times of all clients methods to spike up while it barges its way into an already busy server. So by definition, while I’m sure it wouldn’t hurt, I don’t think a this.unblock() would help either.

Hmm… I never thought about Meteor._sleepForMs(ms). That’s almost like a synchronous Meteor.defer() because that actually keeps your code in order right? Where I would think multiple Meteor.defer() blocks could run your code out of order since it’s throwing the blocks somewhere in the future of the event loop? And _sleepForMs frees up the event loop then?

It definitely seems like my cacheHugeDataSet needs to be broken up in whatever way to give priority back to all the waiting, numerous calls to #1.

Have has anyone ever made a priortized Meteor Method package? Does something like that exist? The ideal scenario would be to have cacheHugeDataSet hang out and do nothing on the server until the CPU was low enough, or the frequency of the methods in #1 dropped to a certain threshold, etc.

rjdavid · September 22, 2020, 7:40am

We use jobs/queue in cases like this.

peterfkruger · September 22, 2020, 9:45am

For such data processing we took a totally different approach. Our angle is that Meteor is a webapp where both client and server deal with the immediate needs of what you expect from an interactive application. Anything else beyond that objectives needs to be dealt with outside of Meteor. This is particularly true because of the single-thread / blocking nature of Node.js (save worker threads).

We have therefore created a rather simple microservice architecture.

There is the Meteor server (potentially in multiple instances) pushing down asynchronous tasks to services via a message broker (Apache Kafka in our case).
The services are implemented mostly in Node.js, except in one case which is a highly concurrent Spring Boot (java) application.
Each microservice can be deployed in any number of instances on any number of servers, so scalability of these services is a given by the virtue of this architecture.
Each service can be addressed via a dedicated topic, and every message (task) is received and executed by exactly one instance of that service – this again didn’t have to be implemented, you’re just getting this by using Kafka as a message broker.
No more long, time or CPU consuming operations in Meteor! Nada.

Now, the important question remains, what if the operation needs to be semi-asynchronous? Meaning: it should run asynchronously, but ultimately there need to be some signal, or even a progress indicator, to be displayed on the client.

Our solution in such cases is schematically as follows:

the message sent to the service contains a trackingId
when the service starts processing the task, it creates a document with that id and adds a timestamp
when the task is finished, the service updates the document with a property that signals completion.
alternatively, if a progress indicator is needed on the client, a long running service can repeatedly update the document and set the progress metric.
the client subscribes to that document and hence receives live updates of either just start/end of the task or with every progress step.

Easy

evolross · September 22, 2020, 10:58pm

Care to elaborate briefly on what packages or patterns you use?

And indeed @peterfkruger. Was wanting to avoid (another) microservice for these couple of functions but that may be the real solution. I could probably offload a lot of heavier functions to it. It also seems like hitting a tiny nail with a giant hammer when it’s fine to have the cacheHugeDataSet function hang out in the background and/or execute in smaller pieces over time (e.g. five to ten seconds versus ASAP).

rjdavid · September 22, 2020, 11:12pm

Was using Meteor Jobs but the client component was huge. Moved to beequeue; we ended up unloading all 3rd party processing to it, and now using bull queue

There was meteor stevejobs but not sure of the status

wildhart · September 23, 2020, 12:03am

stevejobs is still alive and well, but hasn’t been updated much recently. stevejobs has (in my opinion) a severe inefficiency problem which after a lengthy debate with the author is pretty much a WONTFIX. stevejobs uses regular polling of the job queue for every job type, so if you have lots of job types it is doing lots of polling all the time. For that reason I use my own fork/rewrite which uses a single observer on the entire job queue, so is much more efficient.

Neither stevejobs nor my own fork can delay jobs based on current server load though. They are just based on a schedule. They can be used to chunk big jobs and run each chunk at intervals if you want.

peterfkruger · September 23, 2020, 12:10pm

In your case the microservice can be as easy as an AWS Lambda. This would execute the heavy code in cacheHugeDataSet. Your would…

move what’s in cacheHugeDataSet today into a Lambda
invoke this.unblock()
trigger the Lambda using HTTP.call(...), or even return a Promise from axios or similar.

Your method would simply idle in I/O wait, so it definitely won’t be blocking the message loop, and because of this.unblock() it wouldn’t be blocking the user critical methods (firing constantly) either. The benefit in this approach is that you’d be getting scalability with respect to cacheHugeDataSet: any number of users could call it simultaneously, and there is no way they could block each other, not even if they do CPU intensive operations. Resulting from this you could get away with much less scaling in Galaxy (if that’s where you deploy to). Lambda autoscaling is probably much cheaper than Galaxy autoscaling.

Outsourcing your long running, potentially even CPU intensive code into a scalable auxiliary system is therefore far more than what this.unblock() or Meteor.defer can give you.

Now, what I have described about our solution with Apache Kafka is clearly not for everyone. We like to host and deploy our own stuff in our own way, but dealing with root servers, VMs, system updates and upgrades, systemd services and other sysadmin tasks is for most people simply too much, I fully understand that. But apparently there are good and viable solutions out there to leverage simple microservices directly from Meteor, Lambda being one of them.

evolross · October 12, 2020, 5:26pm

Question related to this post, can anyone explain the difference between async and wait times on Meteor APM? I don’t think I’ve ever fully understood this:

wildhart · October 12, 2020, 6:22pm

db - Time spent on database activities, including read and write operations.
http - Time spent on processing HTTP requests (accessed with Meteor’s HTTP package)
compute - Time spent on CPU-intensive tasks inside a method (e.g. time spent sorting and calculating some value).
async - Time spent on async activities, especially with NPM modules.
email - Time spent sending emails.
wait - Time the method spent waiting to be processed. This metric is important because methods from a single client are processed sequentially, and so a method can sometimes idle in the queue waiting to be processed.

peterfkruger · October 12, 2020, 6:41pm

Just as a remark, the term “async” in a response time breakdown is still kind of fuzzy.

So presumably (and unless I’m mistaken) if there is an api that returns a Promise, and its result will be accessed using await, it will qualify as async time.

In that Promise (or async function, which is the same) there can be just an I/O operation that lasts 5 sec while the CPU core isn’t doing much of anything, or there can be a 0.1 sec I/O followed by a heavy calculation of 4.9 sec that uses 100% CPU, and both might appear as “async 5 sec response time”.

I don’t know Kadira at all, so it’s just my naïve expectation in this case to not want to see “async” as an umbrella over what is either near 0% CPU or near 100% or something in-between. I/O wait in particular seems important enough to be attributed.

zodern · October 12, 2020, 9:51pm

I’ve been working on improving traces in Monti APM.

Kadira/Monti APM add an async event to the trace when the fiber running the method or publication is yielded. In Monti APM I use this description for async events:

Async events show when a Fiber is yielded. This happens when:
- await is used
- Fibers.yield() is called
- A function is called that was wrapped with Meteor.wrapAsync or Meteor.bindEnvironment

With Monti APM you can use the eventStackTrace option to see what line in your code or a package caused the event.

The Monti APM agent records nested events to show what happened during the top-level events. Here is an example trace in Monti APM that shows what happened during an async event:

We can see that of the 1,127ms spent in the async event, 1,098 were spent on compute and the remaining were for the db.

At this time the nested events don’t affect the graphs (all of the time is attributed to their parent event) and there are situations where it decides to not store the nested events, but I plan to change both soon.

peterfkruger · October 12, 2020, 10:53pm

That level of details looks awesome!