Introducing Steve Jobs 3 - now with repeatable jobs, debugging UI, and

msavin · June 24, 2018, 11:20am

Steve Jobs
“It Just Works”

Version 3.0 • See on GitHub • See on Atmosphere • Read Documentation • MIT License

Meteor evolved application development by going from past-time to real-time. The Steve Jobs package takes the next leap forward by letting your application run in future-time!

The Steve Jobs package allows you to schedule tasks for a future date, in a way that is friendly to the Meteor framework and developer. It comes with a wide set of tools to help you get creative, while automating the things that you shouldn’t need to think about.

The Most Requested Feature: Repeating Jobs

The most requested feature was also the trickiest to implement, due to how the rest of the queue was built. I didn’t want to hack in a completely new set of features on top of what already exists. Fortunately, there is a really simple way to achieve this effect while staying aligned with how everything else works. The idea is to enable jobs to replicate themselves before they are resolved.

Jobs.register({
    "syncData": function () {
        var self = this;
        var data = getDataFromSomewhere();

        if (data) {
            var doc = MyCollection.insert(data);

            if (doc) {
                self.replicate({
                    in: {
                        minutes: 5
                    }
                })

                self.success(doc)
            } else {
                self.reschedule({
                    in: {
                        seconds: 30
                    }
                })
            }
        } else {
            self.reschedule({
                in: {
                    minutes: 5
                }
            })
        }
    }
})

The job above will try to get data through some means, and then:

If it receives the data as expected, it will try to insert it into a MongoDB collection
- If the insert is successful, it will replicate the job and tell it to run in 5 minutes, and then mark the original as a success
- If the insert is not successful, the job will reschedule itself to run again in 30 seconds
If it does not receive the data as expected, it will reschedule itself to run again in 5 minutes

this.replicate basically replicates the job document and the necessary data, while allowing you to set a new configuration for it. This enables you to repeat a job as many times as you wish, and to dynamically set the conditions for how or when it should happen.

It’s important to use this.replicate instead of this.reschedule to repeat a job because each job records its history. If you reschedule a job too many times, it’s document would become huge, and that might have consequences. Additionally, using this.replicate makes it easier for your clear resolved job documents in the future and thus free up storage.

For jobs that that run very frequently, you can also use the new this.remove feature to remove the document from the database rather than just marking it as complete.

More Tools for Designing Your Jobs

When you register a job with Jobs.register, you can access a wide array of tools to make sure the job runs exactly the way you want it to:

this.document - access the cached document of the current job
this.set & this.get - persistent state management for the current job
this.success - mark the job as successful
this.failure - mark the job as having failed
this.reschedule - tell the job to run again later
this.replicate - replicate the job with the same arguments
this.remove - remove the job from the collection

Every job must be resolved with this.success, this.failure, this.reschedule, or this.remove, otherwise the queue will log an error and stop running. This is to ensure that a job does not end up looping infinitely.

Each job can hold its own state thanks ot this.set and this.get - meaning if you experience an interruption, you can get the job to pick up where it left off. It can also be used to display things like progress bars.

Additionally, this.failure is automatically called inside of a try try catch block when the code has an error. If your code works fine and the job “failed” for reasons other than code execution, I suggest using this.reschedule instead.

New Configuration Options

Jobs.configure now allows you to customize three more core functions of the Jobs package: setServerId, getDate and log.

With getServerId, you can now specify the mechanism for generating a unique server ID. By default, this uses Random.id(), but after @gary-menzel’s suggestion, it made sense to open this function for customization so that it can be integrated with the server ID that your hosting service may assign.

With getDate, you can now specify how a new Date object should be initialized. By default, the function will return new Date(). With this option, you can, for example, return a Date object that has a future date, to create a “time travel” effect. Thanks to the person who suggested this.

With log, you can configure how your application should log items. By default, the function will use console.log.

Smarter MongoDB Behavior

Of all the pleasures that MongoDB offers, peace of mind is not one of them. Or maybe that’s it?

First, it can take a bit of time for the writes to be reflected in the reads, and that could make jobs run twice. This was resolved by adding an extra condition to the MongoDB queries: the document must meet this criteria, and its _id must not be that of the job that had just run.

Second, it turns out that MongoDB’s upsert function may not be so reliable - if you run a few upserts at the same time, MongoDB might just insert all the documents. This is probably related to the first issue. This created a problem with the dominator function, as the queue might get confused as to which server is active. This has been resolved by making the serverId field unique.

What’s Next?

As is - the Steve Jobs package does its job, and works seamlessly with Meteor.

I’m excited about transactions coming MongoDB 4.0. Along with Write Concerns, Retryable Writes, and the new storage engine, this can be used to make the queue really reliable. For example:

In some jobs, you might need to run two database operations, such this.replicate and this.success, to successfully resolve the job. It would be nice if the two can be combined to assure that both actions happened successfully.
It could also be helpful in designing a mechanism to keep track of many jobs running across many servers.

MongoDB 4.0 is coming this summer, so I will evaluate then whether to keep evolving the project or to simply maintain what it does now. Hopefully, now that MDG has perfected the build tool, it can give its MongoDB packages a couple of upgrades.

The idea is, this could grow into a reliable queue that can run many jobs at once and scale horizontally. It would not be the fastest solution, but it may be so reliable, scalable and developer friendly, that speed would seem overrated. If that were achieved, the next step would be to build an interface and REST API to let anyone run this as a standalone service.

With that said, your help can go a long way! I’m looking for help with the testing strategy, and for scaling the queue to run many jobs at once. If you can help with this, or if you have a different angle, do reach out!

diaconutheodor · June 25, 2018, 10:20am

https://github.com/msavin/SteveJobs..meteor.jobs.scheduler.queue.background.tasks/tree/master/tests does not have automated tests, still won’t reliably use this in any production app. However, I love the initiative! We really really really need a good job mgmt service, this is looking super promising and I absolutely love the retro looking UI.

Please write some good tests! & good job so far.

msavin · June 28, 2018, 10:44am

Try it out - its being downloaded actively and only one minor bug so far

Tests are for amateurs I’m kidding, but I don’t know the first thing about them.

I’m still considering several changes to the package (mainly around creating a collection for each job instead of keeping them in one collection - I suspect it would have long-term performance benefits). Maybe once that’s over someone can help with the testing strategy.

vuhrmeister · June 29, 2018, 6:33pm

Great work, thx. I’m already using it a while, also in production. So far it does it job.

Did you ever thought about using Redis instead of MongoDB?

arggh · July 1, 2018, 8:31am

This package is very well named

msavin · July 6, 2018, 11:32am

I have but it didn’t seem that appealing. Redis persistence and reliability is not that great, and MongoDB just provides a lot more out of the box for that and other things.

In the future, if the package were really optimized, we could query for jobs ahead of time, and keep them an in-memory cache on the server. This can get interesting with Change Streams as well - if a job is canceled on the last minute, the server can be notified.

If performance is really needed, MongoDB also has a memory-based storage engine that can be used.

However, I’m no expert here, and I’m open to being persuaded otherwise.

msavin · January 13, 2019, 3:04pm

Quick update for the v3.1 release:

the internal process which controls which server is active, has been optimized for performance and now automatically purges internal data
new configuration options, most importantly, autoRetry for failed jobs, which is on by default
support for singular jobs, which means the job is only added if there is no pending or failed job with the same arguments
support for unique jobs, which means the job is only added if there is no job with the same arguments

I’m now using the package for hndetox.com, and its working as expected.