Suggestions for performing non-blocking server functions (Importers/API interaction)?

Spyridon · October 5, 2021, 11:01pm

Hello, I’m updating one of our older apps and wondering if there’s any new solutions/been any progress in this area for Meteor? As since Node is typically single threaded, and our app needs to perform some routines to interact with a large number of 3rd party API’s and for importers.

One requirement is we have to do a nightly update cycling through thousands of API calls, and this needs to be performed on multiple 3rd party services.

In addition, sometimes users import large amounts of information, so we would need a routine that imports them one at a time.

I know there’s this package - GitHub - vsivsi/meteor-job-collection: A persistent and reactive job queue for Meteor, supporting distributed workers that can run anywhere. - but it doesn’t seem to be supported anymore and I’m hesitant to rely on it.

In the past, the solutions seemed to be either the job-collection package, or making separate apps connected to the same database for the heavier routines. Just wondering if there’s any better solutions nowadays?

wildhart · October 6, 2021, 12:14am

I’m all-for microservices and have even written a job scheduler myself (although it only uses the same thread), but in reality you should find that all of these operations are non-blocking by default.

Calling external APIs - every time you’re waiting for a response, your thread is free to handle incoming requests.

Importing large amounts of information - as long as you add them to the database one at a time, each time you read or write to the db the thread is free to handle incoming requests. If this data is coming from the client then I make the client send the data to the server in small batches which reduces the load and memory usage on the server - then you can also show a nice progress bar to the user.

For processor-heavy operations, sometimes just slowing them down a bit can help, by inserting thread-friendly delays within the loops:
await new Promise(resolve => setTimeout(resolve, 100));

or since we’re using Meteor (pretty much does the same thing) or if you’re not in an async function.
Meteor._sleepForMs(100);

Of course, it also depends on how much RAM these operations are going to consume and if you need the processes to be able to survive a reboot…

Spyridon · October 6, 2021, 4:12pm

Thanks for the reply!

So a little more explanation, regarding the API calls themselves, I understand they’re waiting for the responses once the calls are made. But the part I’m more concerned with is the loop.

As an example, let’s say at midnight I had to check a collection for a list of documents and make an API call for each document. Since this isn’t triggered by an async function, wouldn’t a server side loop in this case typically be blocking? In that new article - Background jobs | Galaxy Docs - the second paragraph makes it sound like this is definitely still the case.

Or is there a way of writing the loop where this wouldn’t be the case?

And in any case, would the job scheduler above resolve this issue?

wildhart · October 6, 2021, 5:23pm

Despite the lack of async/await or .then(), many in-built Meteor functions are asynchronous and therefore non-blocking. They use something called “Fibers” under the hood which are an alternative method of unblocking the thread while an external time-consuming task (querying the db, waiting for a http call, etc) is happening.

So even though it looks like you’re writing blocking synchronous code, it’s actually not. Some examples:

// now
const docs = myCollection.find({...}).fetch();
// sometime in the future

Let’s say that was a big query, or your database is far away from your server on a slow connection and that line of code took 10s to execute. As soon as the request from your server to the db is sent, your thread is free to do other stuff - even though you haven’t used await and this is not an async function. The only time this call blocks the thread is when the data is received back from the database and loaded into memory - if it’s a lot of data then this can take time.

To prove this to yourself, try this:

const handle = Meteor.setInterval(() => console.log("I'm still running!", new Date()), 10);
console.log("Getting data...");

const docs = myCollection.find({ a really big query }).fetch();

// sometime in the future...
console.log("Got data", docs.length);
Meteor.clearInterval(handle);

You will see the console logs still happening while the query is running, and they might not fire towards the end while the data is being loaded into memory.

Of course, for a big query this is preferable:

const handle = Meteor.setInterval(() => console.log("I'm still running!", new Date()), 10);
console.log("Getting data one at a time...");

myCollection.find({ a really big query }).forEach(doc => {
    console.log("Got a document", new Date());
});

console.log("Finished!");
Meteor.clearInterval(handle);

Here you will see the "Got a document" logs interspersed with I'm still running", with "Finished!" at the very end.

All this time your thread is also free to handle incoming requests from your users, make other database requests, and return results to the users so they probably wont even notice any extra delay.

If you’re making http calls to an external API within that forEach callback, then each of those API calls is also magically unblocked by Meteor’s HTTP methods:

console.log("Getting data one at a time...");

myCollection.find({ a really big query }).forEach(doc => {
    console.log("Got a document", new Date());
    try {
        const result = HTTP.get(`https://external.api/collection/${doc.externalId}`);
        console.log("    Got a result", new Date())
    } catch (error) {
        console.log("    That didn't work!", doc.externalId, error);
    }
});

console.log("Finished!");

If you were using a 3rd party library which did require promises or async/await, you could do:

console.log("Getting data one at a time...");

myCollection.find({ a really big query }).forEach(async doc => {
    console.log("Got a document", new Date());
    try {
        const result = await thirdPartyLibrary.get(`https://external.api/collection/${doc.externalId}`);
        console.log("    Got a result", new Date());
    } catch (error) {
        console.log("    That didn't work!", doc.externalId, error);
    }
});

console.log("Finished!");

Or, if you had an asynchronous 3rd party library which required a (error, result) you could use Meteor.wrapAsync:

import { libary } from 'thirdpartylibrary';

// wrap library.methodWithCallback(apiKey, docId, (error, result) => {}) as an async method:
const wrappedMethod = Meteor.wrapAsync(library.methodWithCallback, library);

console.log("Getting data one at a time...");

myCollection.find({ a really big query }).forEach(doc => {
    console.log("Got a document", new Date());
    try {
        const result = wrappedMethod(MY_API_KEY, doc.externalId);
        console.log("    Got a result", new Date());
    } catch (error) {
        console.log("    That didn't work!", doc.externalId, error);
    }
});

console.log("Finished!");

This loop is actually doing very little and will not block your thread very much. The only time the thread is being blocked is when each doc and result is being loaded into memory.

Magic! Thanks Meteor, you’re awesome!

wildhart · October 6, 2021, 5:28pm

Regarding the new article on background jobs, this is the bit that’s concerning you:

if you perform long-running tasks in the same containers that you are processing your connected users actions they are going to compete using the event-loop

This is only true if the ‘long running’ task is actually consuming your own CPU, i.e. in your own code, such as a loop performing lots of calculations or transforming lots of data. If that long-running task is a loop which is mostly waiting for the db or an external API and dong very little of its own processing, then your event-loop is mostly free.

E.g. I have a function which runs every 4 hours (scheduled with wildhart:jobs) checking users subscriptions, sending subscription reminder emails via a mailgun api, or making automatic charges to the customer’s credit card via a braintree api. The braintree and mailgun apis are slow but that doesn’t block my thread, and I only fetch each customer one-at-a-time using customers.find(...).forEach(customer => {}).

Spyridon · October 6, 2021, 11:56pm

Thanks very much for such a thorough answer!

I was aware about Meteor’s fibers for handling methods, but I wasn’t sure how functions directly on the server were handled. I’ll keep this in mind!

Question regarding that: I don’t see anything in the API docs for Meteor._sleepForMs, is there anywhere I could read about it?

wildhart · October 7, 2021, 12:56am

The “_” in “_sleepForMs()” tells us that the function is not part of the official API. I’m not sure where I learned about it from. It’s here in the code though: meteor/fiber_helpers.js at 365604c91020db2c36b491e7ea332ed730492994 · meteor/meteor · GitHub

Bear in mind that in future Meteor releases that function could disappear or break…

nschwarz · October 12, 2021, 9:17pm

Hello There !

If your server is multi-core, I made a package to offload work on other cores (it has a schedule feature).
you can check it out here : nschwarz:cluster.
It’s perfectly fit for your app