Meteor Scaling - Redis Oplog [Status: Prod ready]

aadams · October 28, 2016, 3:59pm

Sounds amazing, who are you and where did you come from??

efrancis · October 28, 2016, 4:17pm

If you were to start the effort (assuming mdg hasn’t already, I haven’t seen any code yet) I’m sure you’d get some help from community members and if it gained enough traction from mdg themselves. Go for it! This is something that is pretty much unanimously requested, horizontal scaling is Meteor’s worst downfall right now IMO

aadams · October 28, 2016, 4:25pm

Absolutly right. This initative would be awsome. It’s nice to have shiny new things for the Meteor classic community again.

diaconutheodor · October 28, 2016, 4:41pm

I appreciate any help !

I already began scrapping my ideas, drinking my coffee, reviewing meteor’s code, trying to create a schematic for what they did, so I won’t reinvent the wheel, get some ideas. The deeper I go, the more impressed I am on how they managed to pull it off with oplog.

What I know for a fact this is possible. I can’t promise a date, but I will begin it once I have a solid implementation plan, and I will post updates here.

okada · October 28, 2016, 5:58pm

just in case you have not watched yet…

ramez · October 30, 2016, 4:16am

@diaconutheodor,

Thanks for taking the initiative. As mentioned by many here, this is the biggest pain point. I initially thought about using RethinkDB, but (aside from the fact that they crashed recently) their engine was immature in some aspects (e.g. indexing) not to mention their query language.

Your post got me thinking. The reason oplog does not scale well, is that ALL meteor apps (on that DB of course) have to watch ALL the oplog (to compound the issue, if you don’t have the id of that collection in your query, a second mongo call may have to be made to get it). What if we had an intermediate firmware, that watched the oplog (i.e. only one oplog monitor per cluster of meteor instances) and we simply registered to event handlers on it. In other words we distribute the oplog monitoring. Your solution is probably similar to that.

Would we still need Redis here (trying to avoid adding more db’s / data stores)?

We could even build this monitor in Meteor itself … again, just thinking, maybe I am missing something.

diaconutheodor · October 30, 2016, 7:23am

What if we had an intermediate firmware, that watched the oplog (i.e. only one oplog monitor per cluster of meteor instances) and we simply registered to event handlers on it. In other words we distribute the oplog monitoring. Your solution is probably similar to that.

I see your point, but I believe it’s far more complex to implement, bc that central place that reads the oplog needs to be aware of all the “active” subscribed queries from all the users, and it still needs to process everything. Won’t scale!

I’ve been thinking about a solution, I studied the way they did it with oplog, won’t work the same with redis, bc this time we will not listen to all changes and try to make sense of them, this time it will be more specific. I chose redis for now because it is mature, it is fast and used by many huge corporations. Later, we may need to abstract it, but first lets make it work.

Anyways, good news, I have structured the main components, and how exactly would it work, it looks very very promising.

alawi · October 30, 2016, 4:45pm

@diaconutheodor great initiative! just to understand better how this would work, would you mind sharing your initial thoughts on what might be stored in Redis? i.e. if I make it an update to a collection what’ll be the sequence of events until the minimongo is updated?

ramez · October 30, 2016, 5:24pm

Thanks @diaconutheodor for your reply

I believe our solutions are similar, except you are using Redis as the engine that is listening to the oplog and triggering or pushing to cursors to clients.

Let me illustrate. Right now, say we have 100,000 live users on 10 instances. Each instance is listening to the oplog for its 10,000 users (x number of reactive cursors). If we had 2 monitors that listen to the oplog, and each serves 5 instances, that means much less load on the oplog.

Each monitor would sift through the data once for its 5 instances and sends appropriate data down via cursors to its instances. You have just reduced your oplog load dramatically as each oplog entry gets scanned twice as opposed to 100,000 times.

Now, I do think you are doing the same thing with Redis. My concern is in adding another DB layer, more risk and complexity to maintain. Maybe I am wrong, just wondering …

diaconutheodor · October 30, 2016, 5:35pm

@alawi you asked how this works in my view, it’s super barebones:
Mongo.Collection.update|insert|remove -> push to redis
publications listen to redis -> push to client

@ramez
Those “2” monitors that listen to oplog, are required to do the following:

Maintain a list of all the active subscriptions in the system (from all 5 nodes)
When anything is inserted/updated/removed they have to scoop through all active subscriptions and see if its related and update them

What you are doing this way, abstracting the oplog reader, is a good idea, and it may work, but it’s totally different from my idea, that’s all, and it’s much harder to implement.

In my view, oplog will be removed completely, there will be no more need for oplog tailing at ALL, the app will publish changes to redis.

Regarding additional risk and complexity, we are now talking about scaling up, ofcourse you will need different layers to maintain as you scale horizontally/vertically, so I will disregard this as a concern. Plus, it’s super easy to install redis locally, and redis can work very well via an internal network.

ramez · October 30, 2016, 6:19pm

I think I get it now, MongoDB will be the static copy of Redis. So you are counting on Redis for reactivity and Mongo instead of on-disk storage. Look forward to the result. Makes me wonder, if you can replicate mongo-like syntax why do we need to keep mongo then?

okada · October 30, 2016, 7:38pm

Note that the current solution (oplog tailing) works for changes made out of band (changes made outside the app) whereas your solution would not.

diaconutheodor · October 30, 2016, 7:50pm

@okada

Yes, ofcourse I am aware, however if I make changes outside the app, I can still emulate the change behavior in redis. Most changes in the database are done by the app, right ? Also this gives us the option to disable reactivity for large batch inserts, from or outside the app. I think it’s a good exchange.

@ramez

still you don’t get it sorry if I’m not very clear. I’m not storing anything in redis! I’m just using it for the pub/sub system, because it’s fast and super stable, and maybe we could find a use for storing some values in there as well. We shall see about that.

okada · October 30, 2016, 11:30pm

Yes, I believe an API to notify out of band changes would be a good solution. Looking forward to see what you come up with.

ramez · October 31, 2016, 12:45am

Right, but the ‘monitor’ solution would, as there is no firmware in between. Need to digest some more the advantages of parachuting Redis into it.

diaconutheodor · October 31, 2016, 4:16am

@ramez what you plan on doing is very hard, imagine, you’ll have to keep a cache on those “oplog” readers for all the queries that all your users have I see a lot of problems with that approach that’s all, seems overly complex and very tied with how meteor works.

My thoughts put on paper, here’s where I described some improvements that it can bring, and also provided with a way to fine-tune the reactivity process.

Edit:
It’s really not that hard to implement, Meteor already did the hard job for diffing, we just have to introduce few more tweaks and make it stable.

Edit 2:
Just realized that using this, abstracting the reactivity to redis, can enable us to make ANY data-source reactive. For example, we could make a REST-api reactive if it allows us to expose a webhook that it hits when data changes, and we send that data to redis, and the publication updates the live data.

diaconutheodor · October 31, 2016, 6:41am

I need an opinion from a Meteor Developer that worked on this (Oplog & Reactivity). 5 minutes of his time may help me reduce days until first beta release.

@sashko can you help with that please ? Cheers.

diaconutheodor · October 31, 2016, 11:08am

Making progress on this, managed to use the cursor + redis for naive, remove, update, insert.

Now I will need to tackle, fields, limit, skip - shit-load of different scenarios.

Then Handle _id: String or {_id: {$in: {ids} for instant change detection via separate channels.

And also find a way to re-use publication if it has the same selector and options ?

I wish I had MDG’s resources on this one, but making good progress, and boy redis is fast as hell. After I’m done with this, I’ll write some real use-case scenarios and mimic a very loaded app with a lot of traffic, see what performs best, whichever wins I will rest in peace afterwards

diaconutheodor · October 31, 2016, 7:59pm

https://github.com/cult-of-coders/redis-oplog this is where I’ll post the code. Feel free to add issues as ideas / things to take care of etc.

PoC is there. Did some tests, it worked super fast, even for 10,000 inserts in the db. It just acts like nothing has happened.

efrancis · November 1, 2016, 12:28am

I’ve been working on something with tons and tons of concurrent reads/writes and I’ve been beating my head against a wall to figure out how to simulate the functionality of a publication in Meteor without using mergebox or oplog because of the terrible CPU spikes, none of my solutions have been very pretty. if you can get this to a point where it can be dropped into an existing projects and it functions using the same API, I’ll get it into my project ASAP and let you know what issues I run into! it’s not in production so testing it out wouldn’t be a problem