Meteor Scaling - Redis Oplog [Status: Prod ready]

We’ve now moved our app over to Redis Oplog. For the moment everything seems to work perfect :slight_smile: On Kadira we can see, that the creation of observers went down to 3 per minute (before we had >800). CPU usage and method response time are also fine.

// Edit: Just noticed that the CPU usage has increased +10-15% (before we always had max. 25% - now we are at 35-40% due to Kadira).

Just a pondering, noting too serious but…How much work would it be to move this from redis to orbitdb which uses IPFS? Just thinking that it would be pretty amazing to have the same functionality but completely independent of a central server.

@diaconutheodor We getting some errors in Kadira within this function:

foreignSub = Meteor.subscribe("chatMessages", chatId, function () {

                var Chat = Chats.findOne(chatId);

                if (Chat) {

                    if (Chat.user2 == Meteor.userId()) {
                        self.foreignUser(Meteor.users.findOne(Chat.user1)._id);
                    } else {
                        self.foreignUser(Meteor.users.findOne(Chat.user2)._id);
                    }

                }

On some users this gives an

 Uncaught TypeError: Cannot read property '_id' of undefined

We get this error since upgrading. I guess the subscription is marked as ready when it isn’t ready?

@copleykj the amount of work is minimal if it has a pub/sub system that allows sending messages to channels and listening to some channels.

@XTA
Processor increase is normal and it was expected due to the optimistic-ui, however I also see an increase in the number of sessions.

My Pull-Request https://github.com/cult-of-coders/redis-oplog/pull/136 is focused on performance. Also I had to modify a bit publish-composite (you see everything in the README) And I’m going to write a small app that tests the differences.

Regarding “subscription ready”. So, most likely you have a publish composite, when meteor sends out the “ready” event, it doesn’t mean all data has been pushed to the client. There is no way in Meteor to know if all data has been received on the client. If you did not have this error before, it was just a matter of chance :slight_smile: or the “ready” event was being sent much slower.

Oh okay, is this only within publishComposite the case? If I read the Meteor docs, this should not be the case for the normal publish method (subscription.ready()):

Call inside the publish function. Informs the subscriber that an initial, complete snapshot of the record set has been sent.

@XTA: Fun fact: Having a similar problem! :confused:
Couldn’t find a nice solution yet.

Update: Redis-Oplog 1.2.0 has been released

It contains a lot of improvements and stability fixes.

This is prod-ready :slight_smile: . But always QA-it like crazy out of it before deploying live so you won’t blame me if something bad happens :stuck_out_tongue:

However it does not yet have the level of perfection and elegance I want to. So a lot of work still needs to be done. I have solid confirmation that this is indeed the right direction, and enables scalability of reactivity.

I discussed this morning with @mitar some stuff regarding his awesome package reactive-publish, and some other stuff regarding meteor’s internals, and I realized that by using redis-oplog, I kinda reinvented some wheels made by Meteor and I did not hook into the propper places, which led me to write custom code to offer support with other packages, and additional computation.

By hooking into those places, we should expect an even bigger performance increase.

However, I kept my promise and I created a small naive benchmarking tool. This is not a real-life scenario testing tool and it does not use any of the fine-tuning provided by RedisOplog, but it offers us some insight.

I only tested it locally, and the benchmark results are consistent:

  • MongoDB oplog (on the local machine) is around 10%-20% faster in response times.

However in a prod environment with a remote db, I expect better results. Who has the curiosity and the time, can check it out, I will be very happy to assist, currently I want to switch my focus on what is trully important.

And this makes a lot of sense, because, not having a deep knowledge on how everything works in the back scenes, I had to reinvent some work done by MDG, which is less optimal.

Why it’s a bit slower:

  • The additional overhead of sending correct data to redis.
  • Having an additional store for data and performing changes on it.
  • Some checks are performed twice like which changes to send out.

I realized that smaller throttle (faster subsequent writes to db) brings RedisOplog closer to MongoDB Oplog. Which is again, expected.

All of this things will be improved, I promise you that, and I’m starting to shift focus on performance, so all the changes I’m doing have great impact on speeds.

Again, stressing this out, this does not test the fine-tuned reactivity which is the crown jewel of this package. DB was on local server. And I did not have at least 50 active connections :smiley:

Cheers.

11 Likes

I’m currently having lots of scaling problems in my production app. Will definitely start testing redis-oplog in the near-future. Hoping MDG works with you on continuing this. Thank you so much for your hard work.

Thanks @evolross I hope your problems will be fixed, take time to read how it works and how to fine-tune it. You should find ways that apply to your problems. I already talked with 3 people that moved this into prod, and they experienced lots of improvements.

There is still work to do on this, but we’re getting there.

6 Likes

Ping! What is your experience with redis-oplog so far ?

For us, it dramatically improved performance. Because we were using external databases, and tailing of oplog was very costly, having an internal network redis speed things up a lot.

We did not notice any major bugs and it works nicely.

10 Likes

In our case everything ia working fine, too. Our instances stopped crashing when running the daily cronjob which deletes about 100k docs/day.

“We did not notice any major bugs” does not sound too assuring :stuck_out_tongue: are there some kinds of small bugs?

1 Like

Haha true, so, there were some bugs indeed after we went to PROD, but we solved them, and they were not MAJOR just some small edge-cases. Currently it’s stable.

1 Like

Ok, so it sounds like there are no known bugs with the new version. Nice.

I have been using redis-oplog in production for months with hundreds of concurrent users. We had one small issue which was fixed immediately by @diaconutheodor. All in all I’m very happy and can recommend redis-oplog!

10 Likes

I’ve been thinking… few things that I think could be very synergistic with this package, and perhaps practical since we are connecting our apps to Redis:

  • Creating/exposing Redis APIs for Meteor developers use, and to make it work over regular pub/sub. Redis is blazing fast, and can persist, so might as well embrace it all the way
  • Creating some kind of load balancing solution around Redis, in the spirit of meteorhacks:cluster.
  • Creating a “patching” API for redis-oplog in case MongoDB updates happen outside of the Meteor application. Maybe some kind of protected REST API.

Combined, they would take this package from a solution to scale oplog to a solution to scale Meteor, all built around the magic of Redis.

9 Likes

And one more: the idea of Redis going unavailable and stopping updates is a bit unpleasant. In the beginning, it would be nice if there was some kind of mechanism for developers to manage these scenarios or to force it to re-run all the queries.

2 Likes

I’m very likely going to start integrating this into my production app in a few weeks. One question came to mind (sorry if it’s been answered): you remove oplog from your app logically, but what happens to Meteor’s backup “poll and diff” mechanisms? Do they still run with redis-oplog? If the “poll and diff” caused performance problems before adding oplog will it still when using redis-oplog? Is there a way to disable “poll and diff”?

Hello @evolross

A new release will be coming soon making it even more stable. (1.2.1 I plan on finishing it this month)

Regarding your question, there is no fallback with redis-oplog. If redis dies, bye bye reactivity, if redis comes back alive again (we have a 30s infinite retry of connection) reactivity will be back up again. However, when redis resumes, we have to requery all observableCollections and apply changes:

However, you can still rely on poll-and-diff if you want: https://github.com/cult-of-coders/redis-oplog/blob/master/docs/finetuning.md#fallback-to-polling

You can even have mongodb oplog, redis-oplog and poll-and-diff at the same time :slight_smile: by not overriding publish function, but I wouldn’t recommend that though.

2 Likes

@thea (and @diaconutheodor)

redis-oplog is becoming a critical piece for the community, especially production apps. I recall there was planning (months ago!) for a blog about it on the main Meteor blog site (@diaconutheodor and I talked about it privately too and I offered to help review it).

Any updates on that?

Case in point: we now have thousands of concurrent users internationally, with servers in US, Europe and Middle East (planned), redis-oplog is planned to be the glue for that

5 Likes