Heavy computations in Meteor

elie · March 12, 2016, 5:47pm

Hi,

My app has to scores of 15,000 teams every week. It takes a while to go through this number of teams. To stop the cursor from timing out, I run through 500 teams at a time. When this is happening however, this server doing this stuff will be unresponsive and problems can occur if it has to do anything else.

Is there a recommended way of doing something like this with Meteor? I know Meteor/Node.js isn’t really the tool for heavy computations like this. Ideally I’d just like to launch a new thread (or multiple threads), update all the MongoDB documents that need to be updated, and it shouldn’t affect anything else in the app. Any suggestions?

lassombra · March 12, 2016, 7:01pm

This might be of some use to you: http://neilk.net/blog/2013/04/30/why-you-should-use-nodejs-for-CPU-bound-tasks/

In it they explain a way of spinning up a worker thread which will run on a separate core and not block the main event loop, letting your server stay responsive.

elie · March 12, 2016, 7:16pm

Have you tried any of these approaches with Meteor yourself?

lassombra · March 12, 2016, 7:20pm

I have not. In my case what I have for “heavy” computations I put into a separate microservice and used meteorhacks:cluster to coordinate communication between them as well as have them sharing a mongo database.

elie · March 12, 2016, 7:34pm

I also use multiple instances to perform these updates. Each instance is
like a new thread. But it’s expensive running a whole new meteor server
just to perform once a week tasks that take an hour.

lassombra · March 12, 2016, 8:05pm

Yeah, and in my environment I have more like 5-6 times an hour tasks that take a few minutes.

For that reason, I suspect that the worker stuff is what you need.

abhiaiyer · March 12, 2016, 9:00pm

Rate limit this process. We use meteor sleep to do so

elie · March 12, 2016, 9:20pm

What do you mean? How does meteor sleep help? Not sure what you mean by
rate limiting here either. Ideally I’d like the update to happen as quickly
as possible and not slow things down by using sleep in between updates

abhiaiyer · March 13, 2016, 2:32am

Well good luck then. This isnt really a Meteor problem but rather MongoDB and your Server. If you cant move this job to another service then you have to slow it down for sake of your app.

lassombra · March 13, 2016, 2:44am

The only way you are going to get it running as fast as possible without interfering with the main event loop is to spin it off into it’s own thread. I recommend you read that blog post again. It will take some doing to get a worker going, you’ll have to learn more about node itself as this is not something that meteor does for you (one of the few things meteor doesn’t do for you).

emmostrom · March 13, 2016, 4:21am

See if this works for you:

http://www.webtempest.com/meteor-background-jobs-tutorial

elie · March 13, 2016, 5:44am

Thanks. That looks just like what I need, and seems very easy to use too.

elie · March 13, 2016, 6:26am

Seems to call jobs on startup which is quite annoying. Open issue here: https://github.com/Differential/meteor-workers/issues/9

The project as a whole seems dead. Not touched for months.

elie · March 13, 2016, 6:29am

But this led me in the right direction with lots of options here: http://stackoverflow.com/questions/11703010/background-tasks-in-meteor

And this package is popular, active and its admin demo looks really cool:
https://atmospherejs.com/vsivsi/job-collection

necmettin · March 13, 2016, 2:35pm

I’m in a similar situation. Not yet 15k teams but getting there. Also, one cron running every hour and one running every minute. After careful consideration, including the projected userbase, I decided to go with a seperate server that connects to the same database.

This way, I will always be sure that the customer-facing app will only be handling customers.

On a side-note, the cursor should not time out for 15k teams. If it is timing out, you need to set some indexes on those collections. Indexes speed things up considerably.

elie · March 13, 2016, 5:00pm

Pretty sure it should be timing out. It takes about an hour to calculate scores for all teams. That’s about 4 teams a second. But it’s a slightly complicated computation, that involves fetching documents from another collection along the way.

I might be able to optimise a bit by fetching less from the other collection.

I also use another, but recently I’ve been running into some issues when this server has to do multiple things at once. And it’s sort of annoying launching a new $5 dollar server for each of these tasks.

necmettin · March 13, 2016, 5:30pm

I don’t understand. I’m having my crons run in a seperate, non-Meteor process. Downloading XMLs etc. Are you calling a Meteor URL to do these?

elie · March 13, 2016, 10:36pm

I do everything in Meteor. Downloading data, updating scores, etc. Just all this happens in a non-user facing instance of the app.

necmettin · March 14, 2016, 11:31am

What I’m trying to get at is this: Doing everything in Meteor is not the same as calling a URL.

If you do not use a URL to run your background jobs, you won’t timeout. Even when you call a URL in your Meteor app, you can respond to the URL but keep the process working. Whatever you are trying to achieve will run in the background:

this.response.end("\nCompleted.")
/* keeps working after you answer the URL request */
/* code here will run */

For example, in my app, I download a few hundred JSON files, save them to Mongo. I have to download the JSON files in a non-Meteor process (because I have to do some hack-y things in CasperJS), but the rest is handled in Meteor. I call a URL in Meteor, but the URL keeps working after it returns.

This is my setup anyways.

emmostrom · March 14, 2016, 3:27pm

Dug through the source for both projects (Meteor Workers and Meteor Job Collection)

Meteor Workers is using cluster.fork to create a new process. It is an exact copy of your parent process except it isn’t listening on the Meteor port so won’t be handling client connections (that’s a good thing.) Since it is an exact copy it runs through all the same setup and prints out the same messages. This seems a little weird and uses a little more memory than a simple new empty process; but the benefit is that anything you write that would work in the main process will now work in the child process. Had they used child-process.fork instead of cluster it would have taken less memory and not had the duplicate messages printed out; but would have required a lot more setup to tell the empty child process what code to run.

Meteor Job Collection looks cool but it appears to run all the jobs inside a Fiber which means any long running code will block the other fibers (like the main Meteor process) and you will end up back in the same situation you start with.