Meteor Scaling - Redis Oplog [Status: Prod ready]

I need an opinion from a Meteor Developer that worked on this (Oplog & Reactivity). 5 minutes of his time may help me reduce days until first beta release.

@sashko can you help with that please ? :smiley: Cheers.

4 Likes

Making progress on this, managed to use the cursor + redis for naive, remove, update, insert.

Now I will need to tackle, fields, limit, skip - shit-load of different scenarios.

Then Handle _id: String or {_id: {$in: {ids} for instant change detection via separate channels.

And also find a way to re-use publication if it has the same selector and options ?

I wish I had MDGā€™s resources on this one, but making good progress, and boy redis is fast as hell. After Iā€™m done with this, Iā€™ll write some real use-case scenarios and mimic a very loaded app with a lot of traffic, see what performs best, whichever wins I will rest in peace afterwards

11 Likes

https://github.com/cult-of-coders/redis-oplog this is where Iā€™ll post the code. Feel free to add issues as ideas / things to take care of etc.

PoC is there. Did some tests, it worked super fast, even for 10,000 inserts in the db. It just acts like nothing has happened.

Iā€™ve been working on something with tons and tons of concurrent reads/writes and Iā€™ve been beating my head against a wall to figure out how to simulate the functionality of a publication in Meteor without using mergebox or oplog because of the terrible CPU spikes, none of my solutions have been very pretty. if you can get this to a point where it can be dropped into an existing projects and it functions using the same API, Iā€™ll get it into my project ASAP and let you know what issues I run into! itā€™s not in production so testing it out wouldnā€™t be a problem

Hi @efrancis if you are dealing with a lot of concurrent writes, be careful, MongoDB may not be your cup of tea, or it may be but using a cache like redis and a consumer.

What Iā€™m doing right now Iā€™m reinventing their mergebox (sadly), weā€™ll also have the ability to make updates that are non-reactive (not published to redis), and also namespacing the reactivity. Thatā€™s going to finally make chat applications or games in Meteor achievable.

Some interesting things I found:
https://github.com/peerlibrary/meteor-control-mergebox ā€” But it makes some things impossible.

Update:

Guys Iā€™m so excited about this. I managed to fully grasp everything that needs to be done, indeed there are many many hidden facets of this, lots of use-cases and scenarios. This is not just a weekend pet project as I initially thought, itā€™s a bit more than that.

Hereā€™s how crazy some of the usecases are:

4 Likes

If this ends up being as simple as connecting a Redis database and a Meteor application, enabling it to scale well then I can see this being a serious hit among developers in the Meteor community.

3 Likes

This is how it WILL end-up. The specs are already made, itā€™s just a matter of time now :slight_smile: I want to write it with patience and with care, and with a shit-load of tests. Before we hit the first stable release, I would like everyone with a Meteor app to plug it in, see if they get into trouble.

The next step of itā€™s evolution would be to open reactivity to anything basically, via redis. Even for things you donā€™t really want to store in the db :slight_smile: like ā€œuser is typingā€¦ā€ or I donā€™t know you are ā€œdragging somethingā€ and someone sees the dragging live, but it will only save on ā€œdrag stopā€. You get the idea.

Cheers!

8 Likes

This is a great solution
We are in final steps of releasing a big project on meteor and hearing about performance boost is very interesting for us.
Meteor is great and Apollostack is a good solution but if we have some tools for powering up current meteor data layer we are saving many years of effort on meteor legacy packages and projects.

I have some suggestions:
1- possibility to use redis oplog for some collections and keep the rest at their working state
2- good documentation even from beginning to let others to test the library
3- asking MDG team to help, this will help MDG keep its legacy works and customers

I will test the solution as soon as a beta and documented version.
I hope to give meteor legacy data layer a chance to live.

I assume all of this is if you donā€™t care about mergebox?

I actually fail to see how this will solve the oplog bottleneck? I mean, the hard part of scaling oplog is that each server needs to check for each operation if any of the subscribed queries are affected. With redis as pubsub, this is still the case, or what am I missing?

Thatā€™s a good point. I think better solution would be to implement oplog in minimongo and publish only subset of oplog entries that match given query to client. This way you donā€™t need to manage the cache (or the cache would be much smaller) and perform diffs to find out how the query changed. In some cases you could publish more entries than needed for better performance.

I didnā€™t have any problems with scaling meteor publications so far but if I did I would try above solution.

How many concurrent user and subscription do you have and how are you hosting your app?

@babnik63 thanks for the suggestions, we are also in the process of launching an app thatā€™s going to have a lot of requests / s.

Yes we got some tools for powering up the data layer, thatā€™s why ā€œGrapherā€ appeared.

  1. This is already in plan and current specs
  2. Agree
  3. I donā€™t think they care, I asked, no answer. Let them focus on making what they have super stable. Thatā€™s what I really care about from Meteor at this stage. Fast build-up times + Stability.

@seba

So, hereā€™s the flow:
Mutation -> Publish Message To Redis
Publication -> Subscribe to Redis Messages

The default implementation will listen only to their concerned collection, not all of them. This is the first improvement. Instead of listening for data from all collections you only listen for data from yours (network bandwidth + cpu improvement)

Next, since the publishing to Redis is controlled in the App, you can disable it => large batch updates/inserts without a care in the world.

Next, dedicated channels for filters by id or ids. We will publish to something like: ā€œusers::$_idā€, and ofcourse listen to that, this can lead to instant pub/subs for element by ids. Making them crazy fast.

Next, namespacing. You have a chat app, you have threads and each thread has messages. When you insert a message you could specify the namespace ā€œthread-$idā€. And when you ā€œsubscribeā€ to all messages in a thread to it you specify again the namespace ā€œthread-$idā€. This way you can do a live-chat app with ease. ā€œThread-$idā€ is like a separate reactivity channel that can be customized.

@mpowaga

implement oplog in minimongo ? publish only subset of oplog entries ? I donā€™t need to manage the cache ?

Man, maybe you have a good idea, but I really could not understand anything out of it, write a spec in a google drive.

Well, just by knowing how oplog works itā€™s clear that itā€™s not scalable, and that at some point it will explode, so the question: ā€œhow to scale meteor publicationsā€ becomes the question ā€œhow to remove oplogā€

Hope I made myself a bit more clear. Cheers!

Thanks for the clarification. Just a couple more questions.
Btw, donā€™t get me wrong: Iā€™m very excited to see people working on improving oplog performance.
But Iā€™m just trying to learn what itā€™d mean for an application like mine and where I can possibly help on this.

  1. If I look at my application I barely have batch inserts (except for the occasional database schema migration). All of my collections have at least one reactive publication and all of them have publications by simple mongo id and more complex ones. In this scenario (which I believe is pretty typical for Meteor applications), would there still be a benefit?
    Maybe if I split out my application in more microservices I might benefit more of your proposed approach.

  2. Letā€™s say you have a subscription to a simple publication purely by ID and a more complex one that also matches the same record as the simple one. The total amount of messages that Meteor needs to process is now larger if you update a record in that collection?

  3. Does the fact that you have multiple channels mean the updates can now arrive out of sync?
    I mean if you have 2 update operations going in, they might get send to other clients out of order?

  4. Thereā€™s no mergebox anymore right? So clients might get updates to the same record twice?
    You can already have publications that avoid the mergebox like the folks at rocket.chat do. Did you experiment with that and, if so, why did that not suffice?

  5. The namespace is conceptually a simple query before you do a more complex one, right? This I might actually benefit from.
    However, maybe this could be made even more flexible using normal oplog tailing if you could have multiple queries on the same publication, where each one gets more complex. This way you might funnel messages to the right subscriptions more efficient than it is todayā€¦
    E.g.:
    Collection.find([{threadId:x},{users:{$elemMatch:{ā€¦}}],options); Then again, maybe a simple namespace suffices in most of the use cases.

1 Like

Yeah, I think it will be hard to minimize the amount of oplog messages that need to be processed. The approach described here might work well for specific use cases, but wonā€™t do much in the general case (complex queries over most/all your collections). Well, maybe if you split your app in microservices.

I donā€™t have any performance metrics, but intuitively Iā€™d think that if you canā€™t reduce the number of messages, you need to reduce the amount of time it takes to process each message.
I described an approach I was just thinking of at point 5 here: Meteor Scaling - Redis Oplog [Status: Prod ready] The idea basically boils down to this: If you have a complex query, iteratively filter out more and more oplog messages with gradually increasing complex queries.

If I look at my application I barely have batch inserts (except for the occasional database schema migration). All of my collections have at least one reactive publication and all of them have publications by simple mongo id and more complex ones. In this scenario (which I believe is pretty typical for Meteor applications), would there still be a benefit?

This is where it actually shines, when you publish by id or $in: ids. It will push to different channels and it will only be aware of modifications for those id or ids. This is already implemented itā€™s called the ā€œdirect-channelsā€ or ā€œdedicatedā€ approach. This is the fastest way to listen to changes.

Letā€™s say you have a subscription to a simple publication purely by ID and a more complex one that also matches the same record as the simple one. The total amount of messages that Meteor needs to process is now larger if you update a record in that collection?

Meteor will process it twice, (very fast) for the dedicated one, but for the more complex one, it will still need to see if it matches the complex query, especially cases with limit,sort, skip.

Does the fact that you have multiple channels mean the updates can now arrive out of sync?
I mean if you have 2 update operations going in, they might get send to other clients out of order?

Yes it might :slight_smile: I need to study redis and see what it does, because it will really depend on how itā€™s pub/sub system works. However, no matter how they arrive, the end result will be correct.

Thereā€™s no mergebox anymore right? So clients might get updates to the same record twice?
You can already have publications that avoid the mergebox like the folks at rocket.chat do. Did you experiment with that and, if so, why did that not suffice?

Yes, there is a mergebox that stores clientā€™s image on server, without it some things are impossible. However, there is a plan to have the same observer for the same publications (with same filters and options). Regarding data stored on the server. Come-onā€¦ RAM is cheap. I always prefer something that consumes more RAM rather than CPU :D, and to be able to consume 1GB of ramā€¦ youā€™ll need a LOT of Data.

Update: actually you just gave me a superbe idea, I can only store the current ids on the serverā€™s image. :smiley: that might just work.

The namespace is conceptually a simple query before you do a more complex one, right? This I might actually benefit from.

Itā€™s not a query, itā€™s a namespace, meaning when you do inserts/updates/removes, you can specify in which namespace to publish, doing this, it will no longer polute the main collection namespace, therefor changes like that, could not be ā€œheardā€ in a specific subscription, but we can make it so you can do that as well, to send it both to the main collection namespace and your dedicated one :slight_smile:

Thanks for replying so fast.

I know, but I was just thinking out loud for my own use case. I say conceptually itā€™s the same, because the idea is to reduce the time spend processing database updates by doing a quick & coarse grained pre-filtering before the actual filter (query). If you do this with a query, youā€™d still listen to each message, but you might improve performance enough that this doesnā€™t matter while still maintaining some properties that are guaranteed today (order and single-updates specifically). Sure thereā€™s an upper limit. But weā€™re not all facebooks and googles.

In case youā€™re wondering, I use meteor to control display devices that have a device agent that subscribes to state updates and where DDP messages are translated in non-idempotent device API calls. For this, I need updates to arrive in sync and only once. So I look at it from this specific angle, but this might not be necessary for your average web UI app.

If you do this with a query, youā€™d still listen to each message, but you might improve performance enough

Not true, thatā€™s the thing, you only listen to updates that are done on that given namespace, other updates donā€™t even reach the publish function.

Check Usage:

Yeah, but I was talking about my idea, with multiple queries that are processed iteratively.

With the redis namespace approach, the message would never reach you. With the multiple query approach youā€™d still be receiving each message, but now you might be able to process them fast enough that it doesnā€™t matter.

In case youā€™re wondering, I use meteor to control display devices that have a device agent that subscribes to state updates and where DDP messages are translated in non-idempotent device API calls. For this, I need updates to arrive in sync and only once. So I look at it from this specific angle, but this might not be necessary for your average web UI app.

What do you mean only once ? If I make 2 updates to something of your concern, why is it a problem if it arrives twice ? Does the current oplog behave differently ?