How to run heavy startup methods in a seperate thread?

elie · April 26, 2015, 3:59pm

My app has some heavy work that needs to happen once or twice a week.

If the app shuts down for some reason and doesn’t perform the task for when it was scheduled, it will do the work on startup. The problem is that when it’s doing this work, no one can access the website. Till now it’s been in a Meteor.startup block, but I’ve also tried putting it in a setTimeout method with a 1ms timeout, and calling a method on startup and then using this.unblock(), but none of these methods work.

Once this heavy function is running, the site is inaccessible. Any ideas of a way around this? Or how I can run the work in a new “thread”?

elie · April 26, 2015, 5:50pm

I have no idea how to solve this problem. I’ve also tried using the second solution that uses Futures here: http://www.meteorpedia.com/read/Async_on_server but no matter what I do, it’s impossible to load a page while the server is running the function. Is this possible using Meteor? (The first solution on the above page no longer works it seems).

benjick · April 26, 2015, 6:11pm

Just curious, what are you doing that’s so heavy on the system?

elie · April 26, 2015, 6:42pm

It’s a fantasy football game. Each week, a player’s lineup and substitutes are saved before the games kickoff.
The code for each team looks something as follows:

function saveGameweekForTeam(team, gameweek) {
var history = team.history || [];

history[gameweek] = {
lineup: team.lineup,
subs: team.subs
};

Teams.update(team._id, {
$set: {
history: history
}
});
}

history gets bigger as the season progresses and includes more data such as scores and a bit more data. It is surprising that it takes so long to run this method about 2000 times. It can take well over 10 minutes on my laptop, maybe even 30 minutes.

What’s interesting is that if I call a heavy method like this from the client, it doesn’t stop the server from working and this.unblock works correctly, but when it is isn’t a client call, the server blocks up completely.

benjick · April 26, 2015, 6:54pm

Where are you hosting your app?

jchristman · April 27, 2015, 2:43am

So I don’t know quite how you are calling this, but if you are doing it as a Meteor.method, the explanation (or a solution to your problem) could be read from the Meteor docs:

If you include a callback function as the last argument (which can’t be an argument to the method, since functions aren’t serializable), the method will run asynchronously: it will return nothing in particular and will not throw an exception. When the method is complete (which may or may not happen before Meteor.call returns), the callback will be called with two arguments: error and result. If an error was thrown, then error will be the exception object. Otherwise, error will be undefined and the return value (possibly undefined) will be in result.

Therefore, I believe that if you declare your function like this:

Method.methods({
    saveGameweekForTeam: function(team, gameweek) {
        // Code here
    }
});

and then call it like this from the server or the client:

Meteor.call('saveGameweekForTeam', team, gameweek, function(err, result) { });

it should be asynchronous, regardless of what you are doing in the function. You don’t even need to do anything with the callback - just include it. This should be asynchronous though.

mrzafod · April 27, 2015, 8:16am

First you have to use mongodb positional operator - it allows an atomic update your history array instead of overwrite it
Second it is not clear how do you call saveGameweekForTeam method?

elie · April 27, 2015, 8:52am

I run it once for each team. The code is something like
Teams.find().forEach(savegameweekforteam);

I’ll look into the positional operator and I’m hoping the suggestion for
making the method call async works as expected. Thanks

mrzafod · April 27, 2015, 11:35am

Happy if it could be helpfull! But I guess you should optimize the method call - doing find().forEach() is not really nice solution. May be it is better to implement some kind of sheduler? And unblock method call?
And one more if you do not mind. It is better if you pass a calback in Teams.update - it prevents db write overflow. It sould looks like:

var updateTeam = function(teamId, callback) {
    Teams.update(teamId, {...}, callback);
}
            
var teamsIds = Teams.find({...}).map(function(teamDocument) {
    return teamDocument._id;
});
            
var processTeams = function (err, res) {
    // if called with err, throw Error and stop later execution
     if(err) {
        throw err;
        return
    }
            
    // if no error - update next team if it exisits
    if(currentTeam = teamsIds.pop()) {
        // here we call upedte and after it is finished 
        // our processTeams would be called again
        updateTeam(currentTeam, processTeams);
    }
}
processTeams(null);

elie · April 27, 2015, 1:59pm

Using the positional operator definitely sped things up a lot. Must be at least 10 times faster now, probably even more. Thank you

What do you mean by “db write overflow”? And how does your method improve performance/avoid these problems?
Doesn’t using Teams.find().forEach(...) have the advantage of not having to fetch all teams at once?
(If I use map, I’ll also have to return team.lineup and team.subs btw).

mrzafod · April 28, 2015, 5:09am

You can pick _id only by

var teamsIds = Teams.find({...}, {fields: {_id: 1}}).map(function(teamDocument) {
    return teamDocument._id;
});

This piece of code serves to fetch _id’s of teams. I just try to explain that it is better to pass a callback to Teams.update - without it your write operation is block your code. I just advise you how to refactor your code to make it faster. You could fetch lineup, subs and anything else for each team - I show only the schema.

lai · April 29, 2015, 12:25pm

You can make it even faster if you use the rawCollection(), which was recently added on Meteor 1.0.4, and lets you use the rest of the JS MongoDB API and not just the Meteor-wrapped ones:

var batch = Teams.rawCollection().initializeUnorderedBulkOp(); 
// There's also an Ordered version of it, but if the order doesn't matter, just use Unordered, because it should be faster (I think)

// Queues all the operations
_.each(teamIds, function (id) {
  batch.find({ /* your criteria */ }).updateOne( /* your update modifier */);
});

// Executes all the operations in one go
batch.execute(function (err, result) {
  if (err) {
    console.log('Something bad happened', err);
  } else {
    console.log('Updated', result.nModified, 'teams(s).');
  }
});

Here’s the MongoDB API docs:
http://mongodb.github.io/node-mongodb-native/api-generated/unordered.html
http://mongodb.github.io/node-mongodb-native/api-generated/ordered.html

elie · April 29, 2015, 12:43pm

Interesting. Any limits on performing a batch update? Does it matter how
big the update is?

lai · April 29, 2015, 1:14pm

There should be no limit. But I’m not sure. The most I’ve ever bulk-written was around 500k documents with no issues.

deanius · April 30, 2015, 2:52pm

Kudos to everyone for swarming on the real problem, but I felt like an answer to the original question deserved a posting in a thread of this title as well:

Here’s a package implementing Workers (JS background jobs) that exists for the reason that @elie inquired about:

stevenhornung · April 30, 2015, 2:56pm

Another fantasy sports site here We have to do a lot of daily updates that can be fairly taxing as well. We’ve found a lot of benefit in pulling these kinds of heavy functions out to another project/server that only does these kinds of back office routines that way they never put a load on the web app.

You can just spin up a free-tier EC2 instance for the jobs. Just a suggestion!

lai · May 1, 2015, 2:09am

Yes, I would also highly suggest putting that in a totally separate server to do that job. If you use Digital Ocean, you could probably create an image with your functionality all set up, and then use the Digital Ocean API to create the server to run the job and then tear the server down after it’s done, that way you save money hehe.

Also, another package that’s really great for jobs is Job Collection.

elie · May 1, 2015, 2:58pm

Thanks a lot for all the great replies