Background jobs in Galaxy or other multi-process-environement

macrozone · April 27, 2017, 10:55am

We often have background jobs in our app. E.g. sending an email when certain collections change. Or to do aggregations, prepare exports, etc.

This is easy with meteor:

Companies.find().observe({
  added() {
     // send email, trigger some aggregation, ....
  }

});

But once you are on a multi-process-environment, you can no longer guarantee, that this work is done multiple times (one per process).

We solved this by running a dedicated meteor process, that has a specify env-variable set:

if (Meteor.isDevelopment || process.env.BACKGROUND_JOBS_ENABLED) {
  jobs(); // this start all observers, etc. should only run on one instance
}

because these jobs might be cpu-bound, this process does not answer web-requests and is only for this background jobs. This guarantees that no long-running job blocks client requests.

This is a really easy setup, altough it has some overhead (most code is not used in the background-process).

However on galaxy, you don’t have influence on that. so my question is how you should deal with this on galaxy.

sbr464 · April 27, 2017, 12:28pm

We use a message queue, the tasks that we need completed post to the public api url for our servers. Since Galaxy does load balancing, the server that’s available takes the message. The queue will resend the messages etc, if there is a failure/timeout. The servers can also check in and see if any messages are available if they aren’t busy. We also switched some of these tasks to Webtasks/lambda etc type workflows.