I need an opinion from a Meteor Developer that worked on this (Oplog & Reactivity). 5 minutes of his time may help me reduce days until first beta release.
@sashko can you help with that please ? Cheers.
I need an opinion from a Meteor Developer that worked on this (Oplog & Reactivity). 5 minutes of his time may help me reduce days until first beta release.
@sashko can you help with that please ? Cheers.
Making progress on this, managed to use the cursor + redis for naive, remove, update, insert.
Now I will need to tackle, fields, limit, skip - shit-load of different scenarios.
Then Handle _id: String
or {_id: {$in: {ids}
for instant change detection via separate channels.
And also find a way to re-use publication if it has the same selector and options ?
I wish I had MDGās resources on this one, but making good progress, and boy redis is fast as hell. After Iām done with this, Iāll write some real use-case scenarios and mimic a very loaded app with a lot of traffic, see what performs best, whichever wins I will rest in peace afterwards
https://github.com/cult-of-coders/redis-oplog this is where Iāll post the code. Feel free to add issues as ideas / things to take care of etc.
PoC is there. Did some tests, it worked super fast, even for 10,000 inserts in the db. It just acts like nothing has happened.
Iāve been working on something with tons and tons of concurrent reads/writes and Iāve been beating my head against a wall to figure out how to simulate the functionality of a publication in Meteor without using mergebox or oplog because of the terrible CPU spikes, none of my solutions have been very pretty. if you can get this to a point where it can be dropped into an existing projects and it functions using the same API, Iāll get it into my project ASAP and let you know what issues I run into! itās not in production so testing it out wouldnāt be a problem
Hi @efrancis if you are dealing with a lot of concurrent writes, be careful, MongoDB may not be your cup of tea, or it may be but using a cache like redis and a consumer.
What Iām doing right now Iām reinventing their mergebox (sadly), weāll also have the ability to make updates that are non-reactive (not published to redis), and also namespacing the reactivity. Thatās going to finally make chat applications or games in Meteor achievable.
Some interesting things I found:
https://github.com/peerlibrary/meteor-control-mergebox ā But it makes some things impossible.
Update:
Guys Iām so excited about this. I managed to fully grasp everything that needs to be done, indeed there are many many hidden facets of this, lots of use-cases and scenarios. This is not just a weekend pet project as I initially thought, itās a bit more than that.
Hereās how crazy some of the usecases are:
If this ends up being as simple as connecting a Redis database and a Meteor application, enabling it to scale well then I can see this being a serious hit among developers in the Meteor community.
This is how it WILL end-up. The specs are already made, itās just a matter of time now I want to write it with patience and with care, and with a shit-load of tests. Before we hit the first stable release, I would like everyone with a Meteor app to plug it in, see if they get into trouble.
The next step of itās evolution would be to open reactivity to anything basically, via redis. Even for things you donāt really want to store in the db like āuser is typingā¦ā or I donāt know you are ādragging somethingā and someone sees the dragging live, but it will only save on ādrag stopā. You get the idea.
Cheers!
This is a great solution
We are in final steps of releasing a big project on meteor and hearing about performance boost is very interesting for us.
Meteor is great and Apollostack is a good solution but if we have some tools for powering up current meteor data layer we are saving many years of effort on meteor legacy packages and projects.
I have some suggestions:
1- possibility to use redis oplog for some collections and keep the rest at their working state
2- good documentation even from beginning to let others to test the library
3- asking MDG team to help, this will help MDG keep its legacy works and customers
I will test the solution as soon as a beta and documented version.
I hope to give meteor legacy data layer a chance to live.
I assume all of this is if you donāt care about mergebox?
I actually fail to see how this will solve the oplog bottleneck? I mean, the hard part of scaling oplog is that each server needs to check for each operation if any of the subscribed queries are affected. With redis as pubsub, this is still the case, or what am I missing?
Thatās a good point. I think better solution would be to implement oplog in minimongo and publish only subset of oplog entries that match given query to client. This way you donāt need to manage the cache (or the cache would be much smaller) and perform diffs to find out how the query changed. In some cases you could publish more entries than needed for better performance.
I didnāt have any problems with scaling meteor publications so far but if I did I would try above solution.
How many concurrent user and subscription do you have and how are you hosting your app?
@babnik63 thanks for the suggestions, we are also in the process of launching an app thatās going to have a lot of requests / s.
Yes we got some tools for powering up the data layer, thatās why āGrapherā appeared.
So, hereās the flow:
Mutation -> Publish Message To Redis
Publication -> Subscribe to Redis Messages
The default implementation will listen only to their concerned collection, not all of them. This is the first improvement. Instead of listening for data from all collections you only listen for data from yours (network bandwidth + cpu improvement)
Next, since the publishing to Redis is controlled in the App, you can disable it => large batch updates/inserts without a care in the world.
Next, dedicated channels for filters by id or ids. We will publish to something like: āusers::$_idā, and ofcourse listen to that, this can lead to instant pub/subs for element by ids. Making them crazy fast.
Next, namespacing. You have a chat app, you have threads and each thread has messages. When you insert a message you could specify the namespace āthread-$idā. And when you āsubscribeā to all messages in a thread to it you specify again the namespace āthread-$idā. This way you can do a live-chat app with ease. āThread-$idā is like a separate reactivity channel that can be customized.
implement oplog in minimongo ? publish only subset of oplog entries ? I donāt need to manage the cache ?
Man, maybe you have a good idea, but I really could not understand anything out of it, write a spec in a google drive.
Well, just by knowing how oplog works itās clear that itās not scalable, and that at some point it will explode, so the question: āhow to scale meteor publicationsā becomes the question āhow to remove oplogā
Hope I made myself a bit more clear. Cheers!
Thanks for the clarification. Just a couple more questions.
Btw, donāt get me wrong: Iām very excited to see people working on improving oplog performance.
But Iām just trying to learn what itād mean for an application like mine and where I can possibly help on this.
If I look at my application I barely have batch inserts (except for the occasional database schema migration). All of my collections have at least one reactive publication and all of them have publications by simple mongo id and more complex ones. In this scenario (which I believe is pretty typical for Meteor applications), would there still be a benefit?
Maybe if I split out my application in more microservices I might benefit more of your proposed approach.
Letās say you have a subscription to a simple publication purely by ID and a more complex one that also matches the same record as the simple one. The total amount of messages that Meteor needs to process is now larger if you update a record in that collection?
Does the fact that you have multiple channels mean the updates can now arrive out of sync?
I mean if you have 2 update operations going in, they might get send to other clients out of order?
Thereās no mergebox anymore right? So clients might get updates to the same record twice?
You can already have publications that avoid the mergebox like the folks at rocket.chat do. Did you experiment with that and, if so, why did that not suffice?
The namespace is conceptually a simple query before you do a more complex one, right? This I might actually benefit from.
However, maybe this could be made even more flexible using normal oplog tailing if you could have multiple queries on the same publication, where each one gets more complex. This way you might funnel messages to the right subscriptions more efficient than it is todayā¦
E.g.:
Collection.find([{threadId:x},{users:{$elemMatch:{ā¦}}],options); Then again, maybe a simple namespace suffices in most of the use cases.
Yeah, I think it will be hard to minimize the amount of oplog messages that need to be processed. The approach described here might work well for specific use cases, but wonāt do much in the general case (complex queries over most/all your collections). Well, maybe if you split your app in microservices.
I donāt have any performance metrics, but intuitively Iād think that if you canāt reduce the number of messages, you need to reduce the amount of time it takes to process each message.
I described an approach I was just thinking of at point 5 here: Meteor Scaling - Redis Oplog [Status: Prod ready] The idea basically boils down to this: If you have a complex query, iteratively filter out more and more oplog messages with gradually increasing complex queries.
If I look at my application I barely have batch inserts (except for the occasional database schema migration). All of my collections have at least one reactive publication and all of them have publications by simple mongo id and more complex ones. In this scenario (which I believe is pretty typical for Meteor applications), would there still be a benefit?
This is where it actually shines, when you publish by id or $in: ids. It will push to different channels and it will only be aware of modifications for those id or ids. This is already implemented itās called the ādirect-channelsā or ādedicatedā approach. This is the fastest way to listen to changes.
Letās say you have a subscription to a simple publication purely by ID and a more complex one that also matches the same record as the simple one. The total amount of messages that Meteor needs to process is now larger if you update a record in that collection?
Meteor will process it twice, (very fast) for the dedicated one, but for the more complex one, it will still need to see if it matches the complex query, especially cases with limit,sort, skip.
Does the fact that you have multiple channels mean the updates can now arrive out of sync?
I mean if you have 2 update operations going in, they might get send to other clients out of order?
Yes it might I need to study redis and see what it does, because it will really depend on how itās pub/sub system works. However, no matter how they arrive, the end result will be correct.
Thereās no mergebox anymore right? So clients might get updates to the same record twice?
You can already have publications that avoid the mergebox like the folks at rocket.chat do. Did you experiment with that and, if so, why did that not suffice?
Yes, there is a mergebox that stores clientās image on server, without it some things are impossible. However, there is a plan to have the same observer for the same publications (with same filters and options). Regarding data stored on the server. Come-onā¦ RAM is cheap. I always prefer something that consumes more RAM rather than CPU :D, and to be able to consume 1GB of ramā¦ youāll need a LOT of Data.
Update: actually you just gave me a superbe idea, I can only store the current ids on the serverās image. that might just work.
The namespace is conceptually a simple query before you do a more complex one, right? This I might actually benefit from.
Itās not a query, itās a namespace, meaning when you do inserts/updates/removes, you can specify in which namespace to publish, doing this, it will no longer polute the main collection namespace, therefor changes like that, could not be āheardā in a specific subscription, but we can make it so you can do that as well, to send it both to the main collection namespace and your dedicated one
Thanks for replying so fast.
I know, but I was just thinking out loud for my own use case. I say conceptually itās the same, because the idea is to reduce the time spend processing database updates by doing a quick & coarse grained pre-filtering before the actual filter (query). If you do this with a query, youād still listen to each message, but you might improve performance enough that this doesnāt matter while still maintaining some properties that are guaranteed today (order and single-updates specifically). Sure thereās an upper limit. But weāre not all facebooks and googles.
In case youāre wondering, I use meteor to control display devices that have a device agent that subscribes to state updates and where DDP messages are translated in non-idempotent device API calls. For this, I need updates to arrive in sync and only once. So I look at it from this specific angle, but this might not be necessary for your average web UI app.
If you do this with a query, youād still listen to each message, but you might improve performance enough
Not true, thatās the thing, you only listen to updates that are done on that given namespace, other updates donāt even reach the publish function.
Check Usage:
Yeah, but I was talking about my idea, with multiple queries that are processed iteratively.
With the redis namespace approach, the message would never reach you. With the multiple query approach youād still be receiving each message, but now you might be able to process them fast enough that it doesnāt matter.
In case youāre wondering, I use meteor to control display devices that have a device agent that subscribes to state updates and where DDP messages are translated in non-idempotent device API calls. For this, I need updates to arrive in sync and only once. So I look at it from this specific angle, but this might not be necessary for your average web UI app.
What do you mean only once ? If I make 2 updates to something of your concern, why is it a problem if it arrives twice ? Does the current oplog behave differently ?