Meteor Scaling - Redis Oplog [Status: Prod ready]

I think pub/sub is the most intuitive and common pattern for managing real-time data flow, it’ll always be a thing.

3 Likes

Nice, interesting. Can’t wait to see what’s next for Redis-Oplog.

For the sake of conversation:

  • wouldn’t it be subject to a hard scaling limit since the amount of oplog entries can overwhelm the server?
  • what if you implemented Change Streams instead of oplog? I look into the new documentation, and it looks like the whole thing about the “1000 change streams limit” was misunderstood.
  • what if it were implemented in C++ instead of GO? I believe C++ can be ran in Node.js environments, which means it can be much easier to configure and deploy
  • wouldn’t using redis-oplog be much quicker? it seems like there would be less latency

It seems to me like doing Change Streams support on top of Redis-Oplog might be the ultimate solution - quick pub/sub updates inside of Meteor and support for observing external changes outside of Meteor.

3 Likes

Thanks for the feedback!

I’ve just published the current work-in-progress state of oplogtoredis, the program that will handle tailing the oplog and automatically pushing to redis-oplog: https://github.com/tulip/oplogtoredis – hopefully you can see a bit more about the direction it’s heading from that code and the open issues on the repo.

To address some of your concerns directly:

  • re: scaling/bottleneck: I think oplogtoredis is unlikely to become a bottleneck, because it only needs to handle as much write volume as Mongo – so as long as we can process messages as fast as Mongo performs writes, the bottleneck will be Mongo, not oplogtoredis. We have two huge advantages over Mongo here – 1) what we’re doing is much, much simpler than the actual process of running a mutation, and 2) our work is entirely in-memory, but MongoDB has to write to disk to confirm a write. I think we can definitely handle writes faster than a non-sharded Mongo database, and we can run multiple oplogtoredis instances for a sharded MongoDB (see: https://github.com/tulip/oplogtoredis/issues/1).

  • re: C++ vs Go. There’s definitely lots of reasons to think about using C++, but I don’t think that embeddability inside a Node environment is one of them – one of the explicit goals of the project is to decouple the oplog tailing from the Meteor process so you can scale them independently. One of the scaling issues with Meteor is that tailing the oplog incurs substantial work on the Mongo server – so as you scale out your Mongo app by adding more server processes, you end up increasing load on the Mongo server because it has to handle more and more getmores for the oplog collection. With oplog-redis and oplogtoredis, you can continue to run 1 or 2 oplogtoredis instances as you scale your Meteor servers horizontally.

  • re: latency, that’s a really good point. there’s definitely increased latency in going meteor -> mongo -> oplogtoredis -> redis -> mongo, rather than meteor -> redis -> meteor. However, compared to the default Meteor topology of meteor -> mongo -> meteor, I think the additional in-memory hops to oplogtoredis and to redis are unlikely to introduce appreciable latency, and will be dominated in most cases by the latency between a user and the Meteor server. That said, it’s definitely something we should keep an eye on, and a tradeoff that users will need to make when deciding whether to use vanilla redis-oplog or redis-oplog+oplogtoredis – vanilla redis-oplog will definitely give you lower latency.

  • re: Change Streams, I think they’re a pretty exciting development, but I’m a bit hesitant about trying to use them to replace oplog tailing. In particular, they’re not quite designed to handle the huge number of change streams we’d need to replace oplog tailing (see: https://jira.mongodb.org/browse/SERVER-32946, particularly the note about needed a separate connection per change stream, and https://www.percona.com/blog/2017/11/22/mongodb-3-6-change-streams-nest-temperature-fan-control-use-case/). Fundamentally, I think we should be focusing on ways to offload processing from Mongo, because it’s the hardest-to-scale bottleneck, so giving Mongo the additional responsibility of routing change notifications to subscribers seems like it’ll be harder to scale horizontally than an approach that offloads processing to a combination of the app servers + redis for routing.

Hope that helps give some more context on the design decisions!

9 Likes

Just one comment for @benweissmann and @diaconutheodor: You pull this one off and it will be another very significant adrenaline shot for Meteor and a way forward for many Mongo/LiveQuery projects out there having performance problems. And great to see devs with big “colhões”! (PC not withstanding…)

3 Likes

Update: Released 1.2.7

Changelogs

  • Fixed bug with Login Service Configuration
  • Bug with nested children and their specified fields
  • Optimistic ui improvements
  • Unset fields in the cache are now properly cleared
  • Support for MongoDB Object Ids
  • Ability to have external redis publisher & optimisations for avoiding duplicate dispatches.

Special thanks to @nathan_muir who went into the trenches of Optimistic UI so we can use their native way of working, and brought us support for ObjectId documents, and also identified some very nice issues with $unset.

Cheers!

3 Likes

@hluz If you want a new adrenaline shot, we now have Meteor Live Queries inside Apollo/GraphQL. Meteor Reactivity Alongside GraphQL Apollo — Implemented

tag for @macrozone it may interest you as well

2 Likes

@diaconutheodor , CPU is spiking with redis-oplog in place.
We have two microservices.
1st Microservice is master and core service and
2nd microservice is background job processor (which does highly cpu intensive computation and bulk inserts/updates).

Without redis-oplog only cpu used to spike at 2nd Service which does bulk inserts and update on single collection.
Since we want our system to be reactive. Hence forth we enabled redis-oplog for both the services. And now comes the problem.
When there is JOB (bulk updates) proccessing in 2nd microservice the CPU is also spiking at 1st microservice.
When I disable redis-oplog in 1st microservice everything works smooth except we loose reactivity for the operation that are done at service 2.
I enabled debug for Redis-oplog at Service 1 and found that it is writing continously below logs
[RedisSubscriptionManager] Received event: “i” to collectionName
[RedisSubscriptionManager] Received event: “u” to collectionName
[RedisSubscriptionManager] Received event: “u” to collectionName
… etc

Thanks,
Koti

@koticomake you head into the same problems as with mongodb oplog, your instance gets flooded with tons of information. RedisOplog without being fine-tuned is ultra fast with publications by _id and just a tiny bit faster with standard mongodb oplog. (Maybe slower in some cases). Where it shines is in it’s ability to control the reactivity.

Questions:

  1. Your CPU does not spike if you’re tailing the oplog ? Are you sure you are tailing the oplog and not relying to polling ? Did you test this in prod ? If yes, do you have MONGO_OPLOG_URL set ?
  2. Can your publications be fine-tuned ? Maybe namespaced by a client or something ?
  3. Do you need reactivity at every step inside the job service ? Is the same document updated multiple times ?
  4. I can add a hack for you to do something like “trigger reload for certain collections” on the redisoplog cluster. That would be an interesting idea. To say something like, hey I finished my heavy batch processing, now I want everyone to reload.
1 Like

@diaconutheodor
1A. We are not using MONGO_OPLOG_URL and moreover we have included disable-oplog package as well in to our meteor project. Do we still need to use MONGO_OPLOG_URL ? to make redis-oplog work ???
2A. I will try to fine tune our publications with NameSpace.
3A. Reactivity is not required at all steps inside of our JOB service. All inserts can have direct reactive but all updates can be reactive once the batch operation is done. And yes, Same document might get updated multiple times as well.

I am really excited to get the hack that you promised at point 4.

One more doubt. I might be dumb asking this.
How come redis-oplog (RedisSubscriptionManager) events effect’s main server’s cpu ?? will this not directly deal with Mongo DB operations ?

Thanks,
Koti

  1. That’s what I thought. You didn’t tail the oplog, you were previously relying on polling. If you would have had mongodb oplog enabled, CPU spikes would have been a bigger issue
  2. Perfect, that would really boost performance
  3. Perfect, you have the option {pushToRedis: false} only do the update once at the end and push it to redis.
  4. It affected the “Main Server” because you had a publication listening to messages on that collection. My guess is that you have something like:
Meteor.subscribe({
    items: () { return Items.find(someFilters) }
});

^ That subscription alone, regardless of filters (unless they are by _id) will listen to ALL incoming redis messages, derived from operations performed on Items collection. (Unless you namespaced it)

The true value of RedisOplog lies in fine-tuning, publications by _id and ability to perform writes to db and bypass reactivity. That’s the true value.

2 Likes

Congrats on surpassing 1000 downloads :slight_smile:

4 Likes

@diaconutheodor,
I have one doubt here
3. Perfect, you have the option {pushToRedis: false} only do the update once at the end and push it to redis.

When I update collection in my second service. I kepts for all intermediate updates PushToRedis as false. For all insert I kept pushToRedis:true.
Now how shall I push all changes that happend on that collection to redis ??
Do I need to use this below code and if yes, Do i need to push each and every record by _id ?? Is there any way that I push all changes at once to redis once I am done with my JOB ?

getRedisPusher.publish('tasks', EJSON.stringify({
    [RedisPipe.DOC]: {_id: taskId},
    [RedisPipe.EVENT]: Events.UPDATE,
    [RedisPipe.FIELDS]: ['status']
});

One more,
The true value of RedisOplog lies in fine-tuning, publications by _id and ability to perform writes to db and bypass reactivity. That’s the true value.

I didn’t get your point when you say “ability to perform writes to db and bypass reactivity” ??
What exactly you mean when you say bypass the reactivity ?

Regards,
Koti

ability to perform writes to db and bypass reactivity => { pushToRedis: false }

And regarding your idea, if you do the updated with {pushToRedis: true}, you don’t have to manually send the events, if you don’t then you have to. And unfortunatelly the redis npm driver provides no way to publish multiple messages in one go, so you’ll have to do it in a loop.

I’m trying to see if anything can be done about this. I have the exact same redis-oplog Vent emission I need to send to 1000 Vent-subscribed clients. And redis-oplog is looping to do this, specifically the this.on block inside of a Vent.publish block. It seems like there must be a way to send one update to Redis and then have Redis loop out to the subscribers? You’re thinking it’s a limitation of the Redis NPM driver?

When I update the value’s directly in DB they are not getting reflected with my subscription.
I was thinking that redis-oplog’s cache is not getting invalidated and updated.
How to deal with these kind of approaches ??

This package works by placing a hook on the Mongo functions of Meteor. Therefore, for this to work, you must call the Mongo functions. Editing directly the db obviously won’t call the necessary hooks.

1 Like

https://github.com/cult-of-coders/redis-oplog/blob/master/docs/outside_mutations.md

There’s also another package that’s been developed to deal with this too… it might be mentioned in this thread. I’ve heard @diaconutheodor talk about it somewhere.

1 Like

I just dropped redis-oplog into my app because I was having bad performance with normal oplog. I didn’t do any of the optimizations thinking that it would give similar or slightly better performance until I was ready to put them in. It worked fine running locally, but it turns out it has massively spiked my CPU usage in production and some of my instances are pegged and not responding!

I’ve rolled back, but i’m not sure the best way to debug this. I have a lot of publications (~22) per user a few of which use peerlibrary:reactive-publish. Is it possible the reactive publications are not playing well with redis-oplog?

@imagio, we had issues with reactive publish too which surfaced with redis-oplog. It turns out there was room for optimization. Do all pubs have to be reactive? Can you set some queries to {reactive:false} (remember that all pub queries become automatically reactive when using reactive-publish)

Also, 22 subs per user is a lot! Can you optimize, merge etc.?

We have been using redis-oplog in prod for a year now and it’s been fantastic (we are even mention in the MeteorUp presentation by @diaconutheodor as a showcase).

1 Like

I’m certain that we are doing some sub-optimal things in our publications right now. By merge are you meaning to say that it is more efficient to have one publication that returns a couple of cursors (of potentially different data types) than to have separate publications for each cursor?

Unfortunately our app is pretty complex and we have been iterating quickly without regard to optimization (premature optimization is the root of all evil yada yada). We have a lot of meteor methods that make tons of database calls so I think I could optimize a lot by making most of those calls non-reactive and only updating clients at the end of the method. Unfortunately I don’t quite understand how this works with {pushToRedis: false}. If I make a bunch of updates in a method with pushToRedis: false how do I update my clients at the end of that method? Would a single field update with pushToRedis: true on all affected ids send all of the changed data to the clients or just the single fields updated?