Meteor Scaling - Redis Oplog [Status: Prod ready]

jamesgibson14 · November 22, 2016, 8:31pm

No specific stats yet, by initial tests I mean it is working so far with my current Publish functions. I did notice a drop in CPU, but I haven’t measured it yet.

jamesgibson14 · November 22, 2016, 8:58pm

I guess I just need to make sure Redis is Highly Available.

mz103 · November 22, 2016, 9:03pm

Well, if you’re using it in a production server it probably should be.

Let’s say you need 20 instances to run your Meteor server without Redis, and 5 instances to run it with Redis (purely theoretical).

If Redis goes down, then all of the sudden your Meteor servers start polling the database, their CPU usage goes up, processes crash, user data is potentially lost, you’ve got no servers to be able to deal with requests because the ones that do start up can’t handle the load.

You’d have to login and manually scale up to 20 servers, which could take a bit of time depending on which PaaS or hosting service you’re using.

So in general, if you’re ever dealing with any kind of database in a production server (Redis, Mongo, SQL, whatever)… yeah, it should probably be highly available.

I’m betting that you’re going to make your MongoDB highly available. Redis and other providers should be no different, so long as the users using your applications depend on them.

jamesgibson14 · November 22, 2016, 9:27pm

Great point, mostly I am reacting to my lack of knowledge in using Redis, I did however find http://redislabs.com which seems to be fairly priced and has a memory only options… once redis-oplog gets to the “Production Ready” point.

mz103 · November 22, 2016, 9:36pm

There’s also Compose which is hands down one of the best DaaS I’ve ever used.

They’ll do the setup for you and they’ll scale it for you and you won’t ever have to worry about it. It’s great.

babnik63 · November 23, 2016, 3:14am

How much redis size is enough?
I know that it may depends on many factors but we are not going to store all data there, is there any guidance?
Is there any API for managing memory size on redis oplog?for scaling it may be needed.
I want to use https://redislabs.com as it is available in galaxy Amazon zones.
Thank you

ramez · November 23, 2016, 3:30am

Update
With 40 clients (vs. 20 before) and latest version 1.0.10 of Redis-Oplog

Mongo Oplog
Meteor (master x1) 31%
Meteor (clients x2) 9%
Mongo 7%

Redis Oplog
Meteor (master x1) 20%
Meteor (clients x2) 7%
Mongo 5%

Also, with Redis Oplog, we noticed faster download of data.

Conversation with Theodor (@diaconutheodor)

I had a great conversation for about an hour with Theodore to go through this. Here are my notes:

Redis acts as a pub/sub store. MongoDB doesn’t really have one, hence the oplog trailing in the original design of Meteor
The name Redis-Oplog is a misnomer, there is no oplog trailing anymore of Mongo
Mongo + Redis = RethinkDB (i.e. we get both a DB and reactive cursors) – the one difference is that RethinkDB seems to optimize cursors (i.e. shares them when the query is the same - see last point) while Redis does not
When your selector is _id - based, there is little to no difference with channels, channels are very performant vs mongo oplog trailing when we have complex selector queries
Redis can handle 300,000 messages / s – if we need to scale, we can use a Redis cluster
Next level of optimization is to have a decentralized mechanism for cursor sharing (admittedly for us, we don’t need it, but some applications would benefit from it)

My Conclusions

I think this is really great. It gives us the best of both worlds. Redis does not need much CPU or Memory in our tests, so can coexist with Meteor (i.e. private instance)
Our tests are not that aggressive (we tried running 100 clients which put a strain on the 4-core VM on Digital Ocean we were using)
Results would be even better as we scale up with more users and more cores
Running a Redis instance when you already have Mongo could be seen as a hurdle. But if you truly want to scale up, I believe this is the way to go.
@diaconutheodor is a really nice guy (in addition to being very solid technically) – look forward to working with him on commercial projects.

Look forward to feedback of the community

efrancis · November 23, 2016, 4:23am

are these numbers with a drop-in replacement? or did you use some of the optimizations available like namespacing or custom channels?

diaconutheodor · November 23, 2016, 4:45am

@efrancis it’s just a drop-in replacement, without any fine-tuning.

There is another test, that Ramez showed me on Skype, with 100 users, things get interesting, the CPU-load is 48-50% less with redis-oplog. And I don’t know the exact metrics, much less for mongo database. This results in 2x speed and this tells us that with more users to come, more performance increase we should see.

These are still inconclusive tests, and no fine-tuning has been done.

Update:
It is not true what I mentioned above. It’s not 2x speed for 100 users. Sorry for misleading I was under a wrong impression.

robfallows · November 23, 2016, 10:27am

I’m really impressed with this. It has to be one of the sweetest “plug-ins” for Meteor that I’ve seen in a long time. A huge well-done and kudos to @diaconutheodor for putting this together in such a short time .

ramez · November 23, 2016, 12:12pm

This test was with 100 users, unfortunately I could not repeat it to validate it as the PhantomJS clients kept crashing. It makes sense though, that oplog trailing keeps getting worse much faster than with Redis-Oplog as traffic increases.

diaconutheodor · November 23, 2016, 12:54pm

@ramez @robfallows thank you for the appreciation.

@ramez sorry for misleading the guys here, indeed we could not repeat it, but it makes a lot of sense.

ramez · November 23, 2016, 1:10pm

Thanks for everything @diaconutheodor, will spin up a 16 cores machine to test 100 users and revert so that you can have a good benchmark.

ramez · November 23, 2016, 3:00pm

Update
100 users on 16 core VM

Mongo Oplog
Meteor (master x1) 27 - 39% (lots of spikes 50 - 63%) – 250MB
Meteor (clients x2) 9 - 11%
Mongo 7-9% (spikes of 13%)

Redis Oplog
Meteor (master x1) 24% – 200MB
Meteor (clients x2) 11%
Mongo 5-7%

EDIT

Test Scenario

Here is our test scenario, and it’s related to our app:

We have a master (the teacher) and 100 clients (student)
Our app sends data from teacher to student and back. The most stressful case is when the teacher has many students. The teacher continuously monitors all students, their screens, what they are doing, live in the classroom.
CPU-wise and network-wise this is a serious stress for any system as it’s continuous feedback.

Important

Following conversation with @diaconutheodor I thought I would mention this. All our publications are _id - based. Which is the most optimal way for a cursor even with oplog. However, you will still get degradation of performance as you scale up.

My Observations

We are running the tests off a much more powerful machine than before (16 vs 4 cores). So the Meteor clients are responding faster to requests, which are batched and spread out to reduce server loads. I think this is why we are seeing less of a difference than in prior scenarios. However, this makes sense, as we only have 101 users.

The spikes in this test case are more evident and more pronounced. If we had many classrooms running, these spikes would become the average and would bring our application to a halt. This is with 100 students only, imagine if we had 500 or a thousand. So for us, this package is a must as we are dealing with live data where oplog trailing will simply grow exponentially in CPU consumption. Also, the mongo DB spike is concerning, even if you outsource your DB (as you purchase DB CPU’s usually) as it will delay your data exchange.

Improvements in Our App

During the testing, much of the data we are sending does not really need to go to the DB. If we properly implemented channels that do not have to go Mongo, I expect even further improvements.

My Conclusion

For any real-time app with many users and live data, I can’t see a way around Oplog-Redis, even if you implement GraphQL as the problem is structural (MongoDB does not have pub / sub). Mongo Oplog trailing will result in degradation on all Meteor clients as new instances come up and more users log in.

Thanks @diaconutheodor!

mz103 · November 23, 2016, 3:31pm

@ramez those are awesome benchmarks! And I appreciate your personal opinions and analysis with them as well.

@shawnyoung I remember you saying you were interested in benchmarks.

ramez · November 23, 2016, 4:49pm

Hey @diaconutheodor

With the recent update to the GitHub repo readme:

Do we need to install https://github.com/maxnowack/meteor-allowdeny-redisoplog for client-side mutations? I just tested without and allow / deny works well.

diaconutheodor · November 23, 2016, 4:52pm

That package was made by @maxnowack to enable client-side inserts/updates/remove that are not within a method. We don’t encourage this so it cannot belong in this package, but some guys do need it so he created another package for that.

maxnowack · November 23, 2016, 5:55pm

As @diaconutheodor mentioned, it’s only needed when using client side mutations without custom methods. please note that the package is an a very early state. I’ll do some more tests and create a documentation soon.

ramez · November 23, 2016, 6:02pm

Thanks @maxnowack for creating this package and your reply,

I tried client-side mutations without it and it worked (including allow / deny). Need help understanding what functionalities are missing.

maxnowack · November 23, 2016, 6:09pm

That’s strange. I’ve debugged the whole thing and came to the conclusion, that it cannot work because the default mutation methods are using the mutation functions on the _collection object of a collection. But the redis oplog package does only extend the functions of the collection. I was trying to extend the _collection functions, but that only works with the limitation that options cannot be passed with: https://github.com/cult-of-coders/redis-oplog/pull/41