My app has to scores of 15,000 teams every week. It takes a while to go through this number of teams. To stop the cursor from timing out, I run through 500 teams at a time. When this is happening however, this server doing this stuff will be unresponsive and problems can occur if it has to do anything else.
Is there a recommended way of doing something like this with Meteor? I know Meteor/Node.js isn’t really the tool for heavy computations like this. Ideally I’d just like to launch a new thread (or multiple threads), update all the MongoDB documents that need to be updated, and it shouldn’t affect anything else in the app. Any suggestions?
This might be of some use to you: http://neilk.net/blog/2013/04/30/why-you-should-use-nodejs-for-CPU-bound-tasks/
In it they explain a way of spinning up a worker thread which will run on a separate core and not block the main event loop, letting your server stay responsive.
Have you tried any of these approaches with Meteor yourself?
I have not. In my case what I have for “heavy” computations I put into a separate microservice and used meteorhacks:cluster to coordinate communication between them as well as have them sharing a mongo database.
I also use multiple instances to perform these updates. Each instance is
like a new thread. But it’s expensive running a whole new meteor server
just to perform once a week tasks that take an hour.
Yeah, and in my environment I have more like 5-6 times an hour tasks that take a few minutes.
For that reason, I suspect that the worker stuff is what you need.
Rate limit this process. We use meteor sleep to do so
What do you mean? How does meteor sleep help? Not sure what you mean by
rate limiting here either. Ideally I’d like the update to happen as quickly
as possible and not slow things down by using sleep in between updates
Well good luck then. This isnt really a Meteor problem but rather MongoDB and your Server. If you cant move this job to another service then you have to slow it down for sake of your app.
The only way you are going to get it running as fast as possible without interfering with the main event loop is to spin it off into it’s own thread. I recommend you read that blog post again. It will take some doing to get a worker going, you’ll have to learn more about node itself as this is not something that meteor does for you (one of the few things meteor doesn’t do for you).
Thanks. That looks just like what I need, and seems very easy to use too.
Seems to call jobs on startup which is quite annoying. Open issue here: https://github.com/Differential/meteor-workers/issues/9
The project as a whole seems dead. Not touched for months.
But this led me in the right direction with lots of options here: http://stackoverflow.com/questions/11703010/background-tasks-in-meteor
And this package is popular, active and its admin demo looks really cool:
I’m in a similar situation. Not yet 15k teams but getting there. Also, one cron running every hour and one running every minute. After careful consideration, including the projected userbase, I decided to go with a seperate server that connects to the same database.
This way, I will always be sure that the customer-facing app will only be handling customers.
On a side-note, the cursor should not time out for 15k teams. If it is timing out, you need to set some indexes on those collections. Indexes speed things up considerably.
Pretty sure it should be timing out. It takes about an hour to calculate scores for all teams. That’s about 4 teams a second. But it’s a slightly complicated computation, that involves fetching documents from another collection along the way.
I might be able to optimise a bit by fetching less from the other collection.
I also use another, but recently I’ve been running into some issues when this server has to do multiple things at once. And it’s sort of annoying launching a new $5 dollar server for each of these tasks.
I don’t understand. I’m having my crons run in a seperate, non-Meteor process. Downloading XMLs etc. Are you calling a Meteor URL to do these?
I do everything in Meteor. Downloading data, updating scores, etc. Just all this happens in a non-user facing instance of the app.
What I’m trying to get at is this: Doing everything in Meteor is not the same as calling a URL.
If you do not use a URL to run your background jobs, you won’t timeout. Even when you call a URL in your Meteor app, you can respond to the URL but keep the process working. Whatever you are trying to achieve will run in the background:
/* keeps working after you answer the URL request */
/* code here will run */
For example, in my app, I download a few hundred JSON files, save them to Mongo. I have to download the JSON files in a non-Meteor process (because I have to do some hack-y things in CasperJS), but the rest is handled in Meteor. I call a URL in Meteor, but the URL keeps working after it returns.
This is my setup anyways.
Dug through the source for both projects (Meteor Workers and Meteor Job Collection)
Meteor Workers is using cluster.fork to create a new process. It is an exact copy of your parent process except it isn’t listening on the Meteor port so won’t be handling client connections (that’s a good thing.) Since it is an exact copy it runs through all the same setup and prints out the same messages. This seems a little weird and uses a little more memory than a simple new empty process; but the benefit is that anything you write that would work in the main process will now work in the child process. Had they used child-process.fork instead of cluster it would have taken less memory and not had the duplicate messages printed out; but would have required a lot more setup to tell the empty child process what code to run.
Meteor Job Collection looks cool but it appears to run all the jobs inside a Fiber which means any long running code will block the other fibers (like the main Meteor process) and you will end up back in the same situation you start with.