Go multi-core with nschwarz:cluster

nschwarz · January 5, 2021, 12:19pm

You can now go Multi-core with the package nschwarz:cluster.

It’s a package for Meteor server which gives you a worker pool to handle heavy jobs on other cores.
It has both a persistent MongoDb bound queue and an in-memory queue.
It has event listeners to handle errors and job results.
It can handle both sync and async jobs.

It’s production ready, and may be used for the tree-shaking feature of Meteor@2.0.

update@1.2.0
It now supports scheduled and recurring tasks.

The documentation is available here: https://atmospherejs.com/nschwarz/cluster

peterfkruger · January 5, 2021, 5:38pm

If a MongoDB bound queue is used in conjunction with multiple Meteor instances (of the same app), will this cluster scale out to instances other than where the task was created?

nschwarz · January 5, 2021, 6:06pm

You can safely use the MongoDB queue between different instances (since it’s on the same database).

If you have multiple apps thought I would suggest to run the Cluster instance on a dedicated app that will handle all the tasks executions (this is what I’m doing in production right now).

Mainly I have 4 apps : 3 which only use the queue to add tasks, 1 dedicated to the worker queue using the Cluster instance.

peterfkruger · January 5, 2021, 6:30pm

Thanks, that sounds very promising! Is it also possible to run multiple cluster instances which would share the load of tasks among themselves?

nschwarz · January 5, 2021, 6:43pm

Yes you should be able to do so (not tested yet) ! but don’t forget to set different port values for each cluster otherwise you will have some BINDERROR.

peterfkruger · January 5, 2021, 7:07pm

In this yet untested scenario how would you avoid race conditions? A race condition could occur when multiple cluster instances would pick the same “next” task from the queue in MongoDB, oblivious to each other’s presence. MongoDB’s findOneAndUpdate pops into mind.

While we’re at it, did you think of other types of queues? E.g. redis has a wonderful pub/sub mechanism for message distribution. Apache Kafka is also laid out for the same purpose. If either would be used, you wouldn’t need to program any queue yourself, and there would be no race condition either.

nschwarz · January 5, 2021, 7:49pm

As it is right now, there’s a small window for a race condition since :

tasks are pulled together from the db depending on how much worker are available.
each task is locked as soon as it’s sent to the worker.

The race would be in between of these events.

We could use findAndModify to remove this gap (findOneAndUpdate was deprecated).
I can make the change tonight.

We could go for an external solution such as in-memory queues like Redis, but the purpose of this package is to avoid installing extra-meteor stuff on top (to keep things simple).

But still we could add an Adapter mechanism to this package for those who wants to implement external queues.

Also Redis is in-memory unlike mongoDb so it may impact performances depending on how much data you put in your tasks.

As of Kafka, I don’t see the need here since MongoDb is well enough capable of handling a task queue. Kafka is great, but it’s really overkill to only handle a job queue.

peterfkruger · January 5, 2021, 8:18pm

As with Redis, you can use RDB or AOF for persistence, both are configured in a minute. See Redis persistence.

Kafka is probably an overkill for this job queue, I agree; but redis is lightweight, and, given the redis oplog feature, already in place in many environments.

As to the race condition – if there is any chance, it is just the matter of time until it occurs.

MongoDB; findOneAndUpdate isn’t deprecated, findAndModify is. See here.

nschwarz · January 5, 2021, 8:40pm

Yes, my mistake about findAndModify I misread the doc !
I didn’t thought of multiple cluster running at the same time when I made the package, It’s a minor change, I’ll push it tonight.

MongoDB also have an oplog as far as I know (I don’t really know the difference between the two).
Redis is lightweight but is still an external component to Meteor. If you wish to implement a Redis adapter feel free to put a PR together .

The MongoDb queue is working great and fits my needs and (I think) will do too on most of Meteor apps requiring multi-core.

peterfkruger · January 5, 2021, 10:03pm

So anyway, brilliant work, thanks a bunch – this feature was very much needed by many! A big kudos to you!

nschwarz · January 5, 2021, 10:17pm

Your welcome !
thanks for your input !

I’ve just released 1.1.2 with findOneAndUpdate, it should be good for multi-cluster instances to run without race condition

peterfkruger · January 5, 2021, 10:28pm

That’s great news. With multiple cluster instances leveraging worker threads we have scaling up + (theoretically near infinite) scaling out for long running tasks, which is way more than what we had before.

ivo · January 27, 2021, 1:30pm

Hi @nschwarz,

Is this also an alternative to https://github.com/msavin/SteveJobs and that can be used with multiple instances and/or multi core ? We are using Steve Jobs but somehow our cpu usage are going quite high, trying to find out why but looking for alternative solution meanwhile. Especially as when we scaled horizontally using clever-cloud with multiple instance steve-jobs was not working anymore.

Can it handle lot of jobs run within short deadline (like trigger in 15seconds, 20 seconds, and lot of jobs triggered more or less simultaneously?)

Thanks in advance

nschwarz · January 27, 2021, 2:06pm

hi @ivo,

Yes :

you can run recurring / scheduled tasks with nschwarz:cluster
you can safely use it across multiple instances (if they share the same database obviously).
it is a better alternative than SteveJobs because each task will run on an other core, so your app performances won’t be impacted by the task.
It can handle a lot of jobs within a short deadline but il will depend on your server capacities (allocated cores) and of course, the task itself.

But be aware :

the jobs will start with a delay if all of your cores have not finished the previous tasks.
the jobs will start with a small delay if your cluster refresh rate is too large.

If the tasks are small and are created at the same time, you can also batch the work in a single task instead of many

ivo · January 27, 2021, 2:14pm

Thanks for the clarification. We may consider it in the near future at my company. Some question if it would solve our specific case:

We have multiple users playing a quiz game together. Each question is timed. The first one to access the question will trigger a Job to say if all the team members didn’t reply to the question within the given time, we’ll submit the team answer automatically.
We have a tracker on the db to update the users about the status of their team (who has answered to which question), if everybody has answered, go to next question, etc…

Right now with a single instance on Clever Cloud (we’re french based company and wanted a french hosting company due to our client) it seems to work with steveJobs but somehow we quickly have CPU usage issue on the server (didn’t use a very strong instance, we’re working on monitoring and load testing now to clarify the issue)

We tried horizontal scaling out of curiosity and we could see that even if the instances are connected to the same instance, if 2 users of a same team are using two different instances then the reactivity of the tracker is not as immediate (up to 5/10 sec delay) which for our use case is unacceptable. I’m not sure if using your library would solve this particular issue ?

nschwarz · January 27, 2021, 2:50pm

I don’t know exactly how you’re managing your timers but it is manageable to do so with my package :

you should put a low latency refresh rate on the cluster (like 200ms).
you should have only 1 recuring task that update all the quiz statuses at the same time based on the task data, something like :

   const now = new Date()
   Quizz.update({ finished: false, $or: [
     { currentQuestionDueDtate: { $lte: now }},
     { currentQuestionAnswered: true },
   ]},
   { $set: { $inc: { questionNumber: 1 }}},
   { multi: true })

one task that sets the finished flag to true on finished quizz.

All depends on your implementation

ivo · January 27, 2021, 3:39pm

Thanks, we’re trying to dig deep to understand what caused our issue last time and will see if there is a need to move to your package, but I like it very much so we’ll see soon

ivo · January 27, 2021, 3:53pm

@nschwarz To answer your original question, right now our approach is the following:

First team member to arrive to the question triggers a Jobs that will trigger a function at the end of the timer for this question. If all the team members reply before the end of the timer, the Job is set to completed, if not, the Job will trigger the complete function and complete itself.

Problem is when lot of teams play together we are triggering one job per question per team. Each job for a team will be completed for a team before going to the next so there is no more than 20 open jobs (max number of team) open together. It seemed like a fair approach to our problem but we ran into issues. However we’re not sure yet if the issue came from the Jobs or some methods we’re using. We just started investigating it using MontiAPM and checking all of our methods and subs. We got some wait time which I guess is not good.

The SteveJobs library seemed like a great fit for us and I don’t see why it couldn’t handle 20 jobs at a time, but if yours may be more stable we would consider it, just have to see how to integrate it in our flow.