Meteor Scaling - Redis Oplog [Status: Prod ready]

okada · October 28, 2016, 5:58pm

just in case you have not watched yet…

ramez · October 30, 2016, 4:16am

Thanks for taking the initiative. As mentioned by many here, this is the biggest pain point. I initially thought about using RethinkDB, but (aside from the fact that they crashed recently) their engine was immature in some aspects (e.g. indexing) not to mention their query language.

Your post got me thinking. The reason oplog does not scale well, is that ALL meteor apps (on that DB of course) have to watch ALL the oplog (to compound the issue, if you don’t have the id of that collection in your query, a second mongo call may have to be made to get it). What if we had an intermediate firmware, that watched the oplog (i.e. only one oplog monitor per cluster of meteor instances) and we simply registered to event handlers on it. In other words we distribute the oplog monitoring. Your solution is probably similar to that.

Would we still need Redis here (trying to avoid adding more db’s / data stores)?

We could even build this monitor in Meteor itself … again, just thinking, maybe I am missing something.

diaconutheodor · October 30, 2016, 7:23am

What if we had an intermediate firmware, that watched the oplog (i.e. only one oplog monitor per cluster of meteor instances) and we simply registered to event handlers on it. In other words we distribute the oplog monitoring. Your solution is probably similar to that.

I see your point, but I believe it’s far more complex to implement, bc that central place that reads the oplog needs to be aware of all the “active” subscribed queries from all the users, and it still needs to process everything. Won’t scale!

I’ve been thinking about a solution, I studied the way they did it with oplog, won’t work the same with redis, bc this time we will not listen to all changes and try to make sense of them, this time it will be more specific. I chose redis for now because it is mature, it is fast and used by many huge corporations. Later, we may need to abstract it, but first lets make it work.

Anyways, good news, I have structured the main components, and how exactly would it work, it looks very very promising.

alawi · October 30, 2016, 4:45pm

@diaconutheodor great initiative! just to understand better how this would work, would you mind sharing your initial thoughts on what might be stored in Redis? i.e. if I make it an update to a collection what’ll be the sequence of events until the minimongo is updated?

ramez · October 30, 2016, 5:24pm

Thanks @diaconutheodor for your reply

I believe our solutions are similar, except you are using Redis as the engine that is listening to the oplog and triggering or pushing to cursors to clients.

Let me illustrate. Right now, say we have 100,000 live users on 10 instances. Each instance is listening to the oplog for its 10,000 users (x number of reactive cursors). If we had 2 monitors that listen to the oplog, and each serves 5 instances, that means much less load on the oplog.

Each monitor would sift through the data once for its 5 instances and sends appropriate data down via cursors to its instances. You have just reduced your oplog load dramatically as each oplog entry gets scanned twice as opposed to 100,000 times.

Now, I do think you are doing the same thing with Redis. My concern is in adding another DB layer, more risk and complexity to maintain. Maybe I am wrong, just wondering …

diaconutheodor · October 30, 2016, 5:35pm

@alawi you asked how this works in my view, it’s super barebones:
Mongo.Collection.update|insert|remove -> push to redis
publications listen to redis -> push to client

@ramez
Those “2” monitors that listen to oplog, are required to do the following:

Maintain a list of all the active subscriptions in the system (from all 5 nodes)
When anything is inserted/updated/removed they have to scoop through all active subscriptions and see if its related and update them

What you are doing this way, abstracting the oplog reader, is a good idea, and it may work, but it’s totally different from my idea, that’s all, and it’s much harder to implement.

In my view, oplog will be removed completely, there will be no more need for oplog tailing at ALL, the app will publish changes to redis.

Regarding additional risk and complexity, we are now talking about scaling up, ofcourse you will need different layers to maintain as you scale horizontally/vertically, so I will disregard this as a concern. Plus, it’s super easy to install redis locally, and redis can work very well via an internal network.

ramez · October 30, 2016, 6:19pm

I think I get it now, MongoDB will be the static copy of Redis. So you are counting on Redis for reactivity and Mongo instead of on-disk storage. Look forward to the result. Makes me wonder, if you can replicate mongo-like syntax why do we need to keep mongo then?

okada · October 30, 2016, 7:38pm

Note that the current solution (oplog tailing) works for changes made out of band (changes made outside the app) whereas your solution would not.

diaconutheodor · October 30, 2016, 7:50pm

@okada

Yes, ofcourse I am aware, however if I make changes outside the app, I can still emulate the change behavior in redis. Most changes in the database are done by the app, right ? Also this gives us the option to disable reactivity for large batch inserts, from or outside the app. I think it’s a good exchange.

@ramez

still you don’t get it sorry if I’m not very clear. I’m not storing anything in redis! I’m just using it for the pub/sub system, because it’s fast and super stable, and maybe we could find a use for storing some values in there as well. We shall see about that.

okada · October 30, 2016, 11:30pm

Yes, I believe an API to notify out of band changes would be a good solution. Looking forward to see what you come up with.

ramez · October 31, 2016, 12:45am

Right, but the ‘monitor’ solution would, as there is no firmware in between. Need to digest some more the advantages of parachuting Redis into it.

diaconutheodor · October 31, 2016, 4:16am

@ramez what you plan on doing is very hard, imagine, you’ll have to keep a cache on those “oplog” readers for all the queries that all your users have I see a lot of problems with that approach that’s all, seems overly complex and very tied with how meteor works.

My thoughts put on paper, here’s where I described some improvements that it can bring, and also provided with a way to fine-tune the reactivity process.

Edit:
It’s really not that hard to implement, Meteor already did the hard job for diffing, we just have to introduce few more tweaks and make it stable.

Edit 2:
Just realized that using this, abstracting the reactivity to redis, can enable us to make ANY data-source reactive. For example, we could make a REST-api reactive if it allows us to expose a webhook that it hits when data changes, and we send that data to redis, and the publication updates the live data.

diaconutheodor · October 31, 2016, 6:41am

I need an opinion from a Meteor Developer that worked on this (Oplog & Reactivity). 5 minutes of his time may help me reduce days until first beta release.

@sashko can you help with that please ? Cheers.

diaconutheodor · October 31, 2016, 11:08am

Making progress on this, managed to use the cursor + redis for naive, remove, update, insert.

Now I will need to tackle, fields, limit, skip - shit-load of different scenarios.

Then Handle _id: String or {_id: {$in: {ids} for instant change detection via separate channels.

And also find a way to re-use publication if it has the same selector and options ?

I wish I had MDG’s resources on this one, but making good progress, and boy redis is fast as hell. After I’m done with this, I’ll write some real use-case scenarios and mimic a very loaded app with a lot of traffic, see what performs best, whichever wins I will rest in peace afterwards

diaconutheodor · October 31, 2016, 7:59pm

https://github.com/cult-of-coders/redis-oplog this is where I’ll post the code. Feel free to add issues as ideas / things to take care of etc.

PoC is there. Did some tests, it worked super fast, even for 10,000 inserts in the db. It just acts like nothing has happened.

efrancis · November 1, 2016, 12:28am

I’ve been working on something with tons and tons of concurrent reads/writes and I’ve been beating my head against a wall to figure out how to simulate the functionality of a publication in Meteor without using mergebox or oplog because of the terrible CPU spikes, none of my solutions have been very pretty. if you can get this to a point where it can be dropped into an existing projects and it functions using the same API, I’ll get it into my project ASAP and let you know what issues I run into! it’s not in production so testing it out wouldn’t be a problem

diaconutheodor · November 1, 2016, 7:56am

Hi @efrancis if you are dealing with a lot of concurrent writes, be careful, MongoDB may not be your cup of tea, or it may be but using a cache like redis and a consumer.

What I’m doing right now I’m reinventing their mergebox (sadly), we’ll also have the ability to make updates that are non-reactive (not published to redis), and also namespacing the reactivity. That’s going to finally make chat applications or games in Meteor achievable.

Some interesting things I found:
https://github.com/peerlibrary/meteor-control-mergebox — But it makes some things impossible.

diaconutheodor · November 1, 2016, 12:51pm

Update:

Guys I’m so excited about this. I managed to fully grasp everything that needs to be done, indeed there are many many hidden facets of this, lots of use-cases and scenarios. This is not just a weekend pet project as I initially thought, it’s a bit more than that.

Here’s how crazy some of the usecases are:

mz103 · November 1, 2016, 7:24pm

If this ends up being as simple as connecting a Redis database and a Meteor application, enabling it to scale well then I can see this being a serious hit among developers in the Meteor community.

diaconutheodor · November 1, 2016, 8:53pm

This is how it WILL end-up. The specs are already made, it’s just a matter of time now I want to write it with patience and with care, and with a shit-load of tests. Before we hit the first stable release, I would like everyone with a Meteor app to plug it in, see if they get into trouble.

The next step of it’s evolution would be to open reactivity to anything basically, via redis. Even for things you don’t really want to store in the db like “user is typing…” or I don’t know you are “dragging something” and someone sees the dragging live, but it will only save on “drag stop”. You get the idea.

Cheers!