@diaconutheodor , CPU is spiking with redis-oplog in place.
We have two microservices.
1st Microservice is master and core service and
2nd microservice is background job processor (which does highly cpu intensive computation and bulk inserts/updates).
Without redis-oplog only cpu used to spike at 2nd Service which does bulk inserts and update on single collection.
Since we want our system to be reactive. Hence forth we enabled redis-oplog for both the services. And now comes the problem.
When there is JOB (bulk updates) proccessing in 2nd microservice the CPU is also spiking at 1st microservice.
When I disable redis-oplog in 1st microservice everything works smooth except we loose reactivity for the operation that are done at service 2.
I enabled debug for Redis-oplog at Service 1 and found that it is writing continously below logs
[RedisSubscriptionManager] Received event: âiâ to collectionName
[RedisSubscriptionManager] Received event: âuâ to collectionName
[RedisSubscriptionManager] Received event: âuâ to collectionName
⌠etc
@koticomake you head into the same problems as with mongodb oplog, your instance gets flooded with tons of information. RedisOplog without being fine-tuned is ultra fast with publications by _id and just a tiny bit faster with standard mongodb oplog. (Maybe slower in some cases). Where it shines is in itâs ability to control the reactivity.
Questions:
Your CPU does not spike if youâre tailing the oplog ? Are you sure you are tailing the oplog and not relying to polling ? Did you test this in prod ? If yes, do you have MONGO_OPLOG_URL set ?
Can your publications be fine-tuned ? Maybe namespaced by a client or something ?
Do you need reactivity at every step inside the job service ? Is the same document updated multiple times ?
I can add a hack for you to do something like âtrigger reload for certain collectionsâ on the redisoplog cluster. That would be an interesting idea. To say something like, hey I finished my heavy batch processing, now I want everyone to reload.
@diaconutheodor
1A. We are not using MONGO_OPLOG_URL and moreover we have included disable-oplog package as well in to our meteor project. Do we still need to use MONGO_OPLOG_URL ? to make redis-oplog work ???
2A. I will try to fine tune our publications with NameSpace.
3A. Reactivity is not required at all steps inside of our JOB service. All inserts can have direct reactive but all updates can be reactive once the batch operation is done. And yes, Same document might get updated multiple times as well.
I am really excited to get the hack that you promised at point 4.
One more doubt. I might be dumb asking this.
How come redis-oplog (RedisSubscriptionManager) events effectâs main serverâs cpu ?? will this not directly deal with Mongo DB operations ?
Thatâs what I thought. You didnât tail the oplog, you were previously relying on polling. If you would have had mongodb oplog enabled, CPU spikes would have been a bigger issue
Perfect, that would really boost performance
Perfect, you have the option {pushToRedis: false} only do the update once at the end and push it to redis.
It affected the âMain Serverâ because you had a publication listening to messages on that collection. My guess is that you have something like:
^ That subscription alone, regardless of filters (unless they are by _id) will listen to ALL incoming redis messages, derived from operations performed on Items collection. (Unless you namespaced it)
The true value of RedisOplog lies in fine-tuning, publications by _id and ability to perform writes to db and bypass reactivity. Thatâs the true value.
@diaconutheodor,
I have one doubt here
3. Perfect, you have the option {pushToRedis: false} only do the update once at the end and push it to redis.
When I update collection in my second service. I kepts for all intermediate updates PushToRedis as false. For all insert I kept pushToRedis:true.
Now how shall I push all changes that happend on that collection to redis ??
Do I need to use this below code and if yes, Do i need to push each and every record by _id ?? Is there any way that I push all changes at once to redis once I am done with my JOB ?
One more,
The true value of RedisOplog lies in fine-tuning, publications by _id and ability to perform writes to db and bypass reactivity. Thatâs the true value.
I didnât get your point when you say âability to perform writes to db and bypass reactivityâ ??
What exactly you mean when you say bypass the reactivity ?
ability to perform writes to db and bypass reactivity => { pushToRedis: false }
And regarding your idea, if you do the updated with {pushToRedis: true}, you donât have to manually send the events, if you donât then you have to. And unfortunatelly the redis npm driver provides no way to publish multiple messages in one go, so youâll have to do it in a loop.
Iâm trying to see if anything can be done about this. I have the exact sameredis-oplog Vent emission I need to send to 1000 Vent-subscribed clients. And redis-oplog is looping to do this, specifically the this.on block inside of a Vent.publish block. It seems like there must be a way to send one update to Redis and then have Redis loop out to the subscribers? Youâre thinking itâs a limitation of the Redis NPM driver?
When I update the valueâs directly in DB they are not getting reflected with my subscription.
I was thinking that redis-oplogâs cache is not getting invalidated and updated.
How to deal with these kind of approaches ??
This package works by placing a hook on the Mongo functions of Meteor. Therefore, for this to work, you must call the Mongo functions. Editing directly the db obviously wonât call the necessary hooks.
Thereâs also another package thatâs been developed to deal with this too⌠it might be mentioned in this thread. Iâve heard @diaconutheodor talk about it somewhere.
I just dropped redis-oplog into my app because I was having bad performance with normal oplog. I didnât do any of the optimizations thinking that it would give similar or slightly better performance until I was ready to put them in. It worked fine running locally, but it turns out it has massively spiked my CPU usage in production and some of my instances are pegged and not responding!
Iâve rolled back, but iâm not sure the best way to debug this. I have a lot of publications (~22) per user a few of which use peerlibrary:reactive-publish. Is it possible the reactive publications are not playing well with redis-oplog?
@imagio, we had issues with reactive publish too which surfaced with redis-oplog. It turns out there was room for optimization. Do all pubs have to be reactive? Can you set some queries to {reactive:false} (remember that all pub queries become automatically reactive when using reactive-publish)
Also, 22 subs per user is a lot! Can you optimize, merge etc.?
We have been using redis-oplog in prod for a year now and itâs been fantastic (we are even mention in the MeteorUp presentation by @diaconutheodor as a showcase).
Iâm certain that we are doing some sub-optimal things in our publications right now. By merge are you meaning to say that it is more efficient to have one publication that returns a couple of cursors (of potentially different data types) than to have separate publications for each cursor?
Unfortunately our app is pretty complex and we have been iterating quickly without regard to optimization (premature optimization is the root of all evil yada yada). We have a lot of meteor methods that make tons of database calls so I think I could optimize a lot by making most of those calls non-reactive and only updating clients at the end of the method. Unfortunately I donât quite understand how this works with {pushToRedis: false}. If I make a bunch of updates in a method with pushToRedis: false how do I update my clients at the end of that method? Would a single field update with pushToRedis: true on all affected ids send all of the changed data to the clients or just the single fields updated?
Another perhaps naive question. The docs point out that subscriptions by _id are much more efficient. Would that mean that it would be more efficient to use reactive-publish to find all the ids Iâm interested in and then return a query for each of my publications that only looks for those ids?
The docs of Meteor mention itâs faster to get subs by _id as the oplog always shows the _id (otherwise the oplog handler will have to traverse the list of fields).
If you are using redis-oplog you would focus more on the speed of the queries and having proper indexing. In your case, you are simply splitting a query into two, so I would say it is slower (you can check by adding getTimer() before and after your queries). A better solution is to index properly.
Thanks. Iâve already got proper indexing. I was thinking that doing it the way in my second example would be faster because redis-oplog sends per-id channel notifications by default. My thinking is that even though it is doing to extra query to fetch those ids that live updates would be cheaper because they can go through the SomeCollection::id channel instead of parsing though all messages on the SomeCollection channel to find matching docs. Iâm not sure if that thinking is correct howeverâŚ
No because the ID of the channel is already pre-determined, so updates are sent to the channel. The observers used in redis-oplog would have marginally worse performance as you add more fields vs. a single. Remember, there is no indexing within meteor code, itâs all in MongoDB. The reason (original) oplog is faster with ids is because of the format of the logs (id always showing).
It says that this is more efficient because it will listen on many per-id channels instead of just the main collection channel. Perhaps I am misunderstanding?
Ok, I stand corrected. I had previous conversation with the author and I had assumed that the hit of NOT monitoring by ID is much smaller now (to be almost negligible). Iâll ask him, @diaconutheodor?