Best way to run processing heavy task inside of meteor

kellertobi · October 21, 2016, 4:25pm

We are implementing a notification system where users get notifications based on dates, seconds since first visit or seconds since a spcecific action.

when a new user enters the system (or new content is created) we precalculate the notifications for each and every user as a goal is to give the admins a list at any time what notifications are already sent out and what will be delivered when.

the sending itself is done by a python program but the scheduling (and if possible also the rescheduling) should be done inside of meteor because if the python server is not available the information would be lost.

what is the best way to run such a task inside of the meteor server without blocking the whole server?

sashko · October 21, 2016, 5:36pm

If this is a CPU-heavy task, it will block the server since Node is single-threaded. If by “calculate” you mean that you have to read a lot of data from the DB, then it shouldn’t be a problem because that read will be non-blocking.

I guess it depends on what kind of work it is, and what you mean by “blocking” - operations in Meteor never block the whole server unless there is something crazy going on.

kellertobi · October 21, 2016, 6:25pm

I iterate over around 400 Database entries times 2000 database entries (Content x users) every time the admin changes some content. each time I need to do a string comparison and a update of the corresponding notification.

I thought using fibers would make node multi-threaded?

sashko · October 21, 2016, 6:28pm

Negative - Fibers are coroutines for Node, which still run in one thread but allow you to yield to other routines without writing callbacks.

I don’t know if iterating over that much still will ever really scale if you envision having more than 2000 users or more than 400 items of content… but given the size of that it seems best to use some kind of job queue and do it in a separate process. So Meteor sets a flag in the database that some work needs to be done, and later the python server comes along and does the work in its own thread, which won’t slow down your web server.

kellertobi · October 21, 2016, 6:39pm

This is the way how it is done at the moment, except that the python server gets its information from a REST-Api.

arggh · October 21, 2016, 8:53pm

I recommend you check this out: https://github.com/vsivsi/meteor-job-collection/

kellertobi · October 21, 2016, 9:17pm

Thanks, this package is awesome but will probably not fit my needs.

The notifications are e-mail reminders for our content inside of a module. we have several mode how a content will be unlocked (and an email will be sent):

depending on the date (easiest) but only for future dates (so no dates in the past will be resent)
depending on when the user started the module (e.g. days after the first visit of the user on the module)
depending on viewing another content in the same module (e.g. hours after the user viewed content1 he will be reminded that he can now watch content2)
instant notifications (e.g. password change)
instant notifications (e.g. direct messaging)
summarized notification twice a day (e.g. all comments that the user has not read)

the only thing that isn’t used that frequently is the date-based scheduling (which is the easiest and on its own the least resource hungry)

we also need to keep track that:

the admin knows at any time what mails already have been sent
the admin knows what mails already have been scheduled for the future
the user does not get an email for the same content twice, even if the admin changes the publishing settings for it.

our biggest module is at the time around 200 contents big with at the time 300 users (and by the end of the year at around 2000)

the recalculation takes at the time around 12-16 minutes on a amazon aws ec2 micro machine (1GB Ram and 1CPU Core) with a 2GB Ram, 8GB SSD SWAP, 2 CPU Database Machine.

When a new user enters the system in this module it takes around 5 seconds to calculate the contents for him.

I think all the database network traffic slows this down at most. would it be perhaps possible to run the calculations as a script in the database directly? then we could trigger the db call in meteor and only the db server would have to do something.

arggh · October 22, 2016, 1:52pm

Sorry if I misunderstood your reply, but meteor-job-collection gives you:

a way to schedule jobs to take place either immediately or at any given time in the future
a way to attach any data (and as much as you like) to a job
a way to create as many different job processors as you like
a way to create a job processor as complex as you wish (or can manage)
a way to have as many different types of jobs as you like
a way to create these jobs from anywhere
a way to process these jobs anywhere
…they even provide you with a ready-made plug-n-play admin tool that displays all the jobs with status and metadata

What’s the missing ingredient?

kellertobi · October 27, 2016, 10:50am

Why it dies not fit my needs:

Not the scheduled Jobs themself (which are executed in Python die to Library Support) but the schedulibg of the Jobs is calculation Heavy.
We already have created owr own scheduler, as the tunning of a Job has more dependencies than just time. (e.g. the time has come and a compatible Job die not ran for the User.)

This Post was not about the scheduler itself, but about tunning probably blocking calls inside of Meteor without blocking tbe whole App.

janat08 · May 31, 2017, 8:33pm

Move work out of Meteor’s single threaded event-loop

This is one of the features listed in the package. So I guess this then uses co-routine or something.

kellertobi · June 11, 2017, 3:22pm

well, not only the scheduled jobs themselves are heavy load, but the scheduling is also heavy load. does that package help for this problem also?

janat08 · June 11, 2017, 3:55pm

It doesn’t use the co-routine, co-routine doesn’t even deal with language having single event loop. The author of package takes questions on github.

janat08 · June 12, 2017, 8:40am

Although I’m guessing you might have to have a record that indicates how behind schedule you’re, and how important it is, to then use is it to create offset.

kellertobi · June 22, 2017, 3:20pm

Ok, I cannot find how to actually run the code inside of meteor. I created the jobcollection and want to run e.g.:

blockCpuFor = (ms) ->
	now = (new Date).getTime()
	result = 0
	while true
		result += Math.random() * Math.random()
		if (new Date).getTime() > now + ms
			return
	return

as this is the best way to simulate a heavy computation.

The actual computation does something like

users = db.users.find()
contents = db.contents.find()

users.forEach (user) ->
	contents.rewind()
	contents.forEach(content) ->
		emailDate = users.startedDate + content.secondsAfterStart
		db.emails.insert({
			user: user._id,
			content: content._id,
			sendAt: emailDate
		})

Spyridon · June 22, 2017, 3:32pm

Well, one piece of advice I have if you must run some heavy code in a forEach loop, is add a bit of a delay on it in order to optimize it without bogging down your server too much.

We have some routines on our software that are necessary for validating some data we have on an interval. We use methods similar to this:

var i = 1;
    var users = db.users.find({});
    users.forEach(function (user) {
        (function (i) {
            Meteor.setTimeout(function () {

                // your routine here 
                

            }, i * 1000);
        }(i));
        i++;
    });

This will add a delay so you don’t lock up your server and/or the database too badly.

My other piece of advice is to look at the package “mikowals:batch-insert”. This will allow you to easily insert multiple items to your database with one command. This is much more optimized than rapid firing a bunch of inserts to the database.

Running heavy tasks is something you should avoid if at all possible, but if you must do it, this is the way I would do it.

kellertobi · June 22, 2017, 3:40pm

the method I wrote here is optimized for readability on the forum. In the end, each user has another start date and each content has another delay. Probably I’d to some parts of it in mongodb itself (and hope that this will not block meteor) but having an option to do some stuff directly inside of meteor but in e.g. a separate thread that is heavy load would be nice

SkyRooms · June 22, 2017, 4:10pm

I’m running www.StarCommanderOnline.com which is an… MMO for Meteor!

Every one second, I have the server run over all active players. That is, who is online within 10 minutes.

You need to use a special SetInterval

Meteor.setInterval(function(){
			HEARTBEAT();
		}
	},1000);

I have 1600 user accounts registered, tops online was like 100. Make sure your DB is well indexed and youre running Rdis oplog, or true oplog.

kellertobi · June 22, 2017, 5:34pm

We have around 5000 (and more) users that would get a single set of contents scheduled. the rescheduling of all contents for all users that get the set of contents needs to be done when one administrative user changes the order of one of the contents in the set. at the moment we have around 4000 of such sets.

Last year we decided to do the implementation of the scheduling and the sending of the notifications in python and now figured that the split codebase is kind of hard to maintain.

I just tried a simple query like above (actually with a lot more find parameters) but it seems like cursor iteration is blocking/ non IO. when I run the code above 100 times, no one on my meteor server can do anything for around 30 seconds.

so as long as there is no way to use the same codebase and just run a function in another thread inside of meteor, meteor is not capable of running processor heavy tasks without blocking the server for everyone else (like the guy from MDG stated above)

baris · June 23, 2017, 12:32pm

What about creating and deploying another meteor server that uses same database, preferably on a different physical server. Trigger asynchronously your long running job with server to server ddp.
Even on same server it will have a different node process so it will slow your server but won’t totally block your application.
Also something like https://github.com/mikowals/pause-publish may help if updated records are published.

kellertobi · June 23, 2017, 12:47pm

The idea was to be able to use the same codebase for both our main app and the notification scheduler. The plan was to be able to just call a function with a function as parameter that calculates the notifications in another thread. But if we nonetheless need to maintain another codebase, we could use a language that supports threads (such as the thing we developed last year.)

as long as noone comes up with a solution that enables us to use threads as described (or similar) to above, the discussion is over for me, sry.