Meteor job collection long running jobs causing server crash and restart Exited from signal: SIGBUS (Meteor v1.4.4.3)


#1

Hello, I notice that on some long running jobs using Meteor Job Collection and deployed on Galaxy our app was crashing and restarting constantly (no error log from Galaxy) also when testing locally for extended periods of time (24hrs job running) we notice Meteor crashing with a Exited from signal: SIGBUS if anyone has experienced this type of issue and have any insights I would really appreciate it!

This long running jobs are basically HTTP requests being done on a third party service, we capture the data, clean it and insert/update into Mongo.

Some examples (cannot share the actual code due to my IP agreement)

Creating a call to the job

const job = new Job(myJobs, 'longRunningJob', { kit: kitUser });
    job.depends(previousJob);
    job.priority('normal')
        .retry({ retries: myJobs.forever, wait: 15 * 1000, backoff: 'exponential' })
        .save();

Job

const jobQueue = myJobs.processJobs('longRunningJob', {workTimeout: 30000}, Meteor.bindEnviroment(function (job, cb) {
     const kit = job.data.kit;

     longRunningFunc(kit, function (err) {
         if (err) {
             job.log(`Calling longRunningFunc failed with error: ${err}`,
                 {level: 'warning'});
             console.log(err);
             job.fail('' + err);
         } else {
             job.done();
         }
         cb();
     });
}));

longRunningFunc gets the data, cleans it and insert it into mongo

Should we update to 1.5.2?? we had some issues with 1.5.1 so we had to go back to 1.4.4.3 which was our stable version.
Could it be a memory leak due to the callbacks?

Thanks guys! Appreciate the help!

Thread to Meteor Job Collection Issue


#2

To add further information to Luis post. This is the only error message we’re getting at Galaxy:

w86sn
2017-09-11 08:37:49+08:00/app/run.sh: line 25: 9 Killed node ${NODE_OPTIONS:-} main.js
tzc1s
2017-09-12 07:38:49+08:00/app/run.sh: line 25: 8 Killed node ${NODE_OPTIONS:-} main.js

This is without any load on the server currently, we just have a daily job that starts every 23h and leads to that crash (as you can see it crashed twice within the same 23h period).