Meteor Scaling - Redis Oplog [Status: Prod ready]

Reasons to be excited:

Simon from classcraft:

First, thank you very much for giving this a chance, the biggest Meteor app is the best first case study. I am going to be on top of it to make sure it’s running smooth for them.

Must admit, got a bit excited, an almost 100% improvement is not to be ignored. But honestly, I would’ve been much happier with a 300% improvement. Anyway, 100% doesn’t necessarily mean that you can run on half of your infrastructure, because the more you grow the better the performance will get. And the important part is: you can continue growing

The fight isn’t over yet
What if I told there is room for even more performance improvement ? (https://github.com/cult-of-coders/redis-oplog/issues/199) But this one is relatively hard… It’ll take me a while to accomplish and do it right (At least 16 hours of work), as I need to rethink the architecture and refactor stuff so it doesn’t become an unmaintainable mess.

Just a little bit of a history since we’re celebrating a nice achievement:

  • First we did the fetching (at the insert level) when we mutated and push it to redis
  • Then we realized it’s prone to race-conditions so we moved fetching to processors that interogate the db individually
  • Then we realized if we have a lot of observers, it can lead to 30-40 db requests for a single event
  • Then we optimized and aggregated what is needed for processing into 1 single db request / event (unless it’s a removal event)
  • Then this is the holly grail: issue #199

I won’t even start on how many hairs I pulled because of Optimistic UI !! But we nailed it in the end.

Stressing some things out:
The road to making any DB reactive has been opened.
The road to scaling reactivity infinitely has been opened.
The road of heavy investments into big Meteor apps has been opened.

We made this happen together, it was a shared effort.

23 Likes

That’s some crazy news :smiley:

I have an interview blog post with Shawn Young, the CEO of ClassCraft on the way - hopefully it will be ready next week. They are running one of the largest Meteor apps - and I can tell you, they are not going light on pub/sub - so this is really big news.

The fact that they cut down their container use by half is a great reason to use RedisOplog even if you do not “need” it. I’m feeling bad for MDG revenue numbers though. Either way, I think this package a net gain for them, and I hope they start to recognize it.

2 Likes

Short term loss for long term gain, success stories at scale will attract more enterprises for sure :slight_smile:

1 Like

I’m not feeling bad at all… on the contrary, many people move away from Galaxy because of the cost, but now if you can handle many users with just 3 containers then it’s already a plus, no more devops, integrated APM, and hopefully soon, integrated Redis!

3 Likes

I don’t know what MDG’s plans are, but frankly this work needs to be compensated and integrated into Meteor with official backing. And collaborate on combining this with any upcoming changes to Meteor for Mongo 3.6 or Apollo integration - the fact that we still don’t have clear answers on if or how MDG plans to combine Meteor and Apollo is a bit baffling.

And someone needs to offer up a ‘Meteor stack’ on AWS marketplace - with built in redis-oplog, Kadira, and scripts for deployment and autoscaling. It will be a more powerful solution than Galaxy not to mention cheaper.

3 Likes

@diaconutheodor I want to thank you for this from the bottom of my heart… This level of scalability has been on my wishlist since the 1.0 release. I have a lot of new found enthusiasm for Meteor and it’s future now and that speaks volumes.

6 Likes

Hey guys,

I’m the developer working on the redis-oplog integration at Classcraft. As @diaconutheodor mentioned we released the package integration last night and we’ve seen a significant improvement of CPU usage.

After a full day of collecting data it looks the improvement might be less than the 100% I mentioned to @diaconutheodor this morning but it’s still significant. And I’m still working on namespaces/channels optimization so it’ll only get better. I’ll keep you informed.

We’re super enthusiast about this and I’m sure we’ll be able to contribute to the project. The future of Meteor is definitely in the hands of the community.

Kudos to @diaconutheodor and everybody who made this possible.

15 Likes

At first I was feeling a bit bad about such low performance improvements (it was bittersweet), but then I realized that at that scale, you will head into the same problems oplog had without namespaces/channels/vent: Too much stuff to process. And RedisOplog requires more CPU to process stuff than classic Oplog, because the data that comes in needs to be fetched from DB to avoid race-conditions.

But this is where RedisOplog truly shines: in the fine-tuning, including the newest addition: redis-oplog/vent.md at master · cult-of-coders/redis-oplog · GitHub . So all in all, just by adding it and gaining almost 100%, even 50% is not that bad.

@copleykj glad you have your enthusiasm back. I am very happy with how Meteor is evolving honestly. I still see nothing better. But I see some stuff we can steal from NextJS.

We already backed (Feature Request: Deploying a Meteor App with AWS Elastic Beanstalk · Issue #768 · zodern/meteor-up · GitHub) to support elastic beanstalk aws deployments, setting up kadira and redis yourself doesn’t seem like that big of a deal honestly.

Apollo and Meteor seem to me 2 different beasts. Apollo is npm, you can already start using it in Meteor, what level of integration are you looking for ? (Just curious) This seems more than enough to me: https://www.apollographql.com/docs/react/recipes/meteor.html

If you’re looking for GraphQL like queries, take a look at Grapher it is my vision for the evolution of data fetching layer, inside Meteor, and I hope that it will be recognized one day as the default way to fetch data from MongoDB inside Meteor. I see it as a bigger achievement than RedisOplog (but some of you may contradict me) @simonbelanger you should take a look at Grapher also, it may be the next stepping stone for Classcraft.

2 Likes

Regarding the Classcraft example

I’ve been in lurk mode for the last two years with meteor. We won’t consider using it until some of the performance issues have clear cut solutions. So this is important to people on the fence.

It would be great to see numbers. Percentages of improvement are promising but how does that break down?

If Meteor was 6 times less efficient than other stacks then a doubling of performance would be great but still 3 times less efficient. So for those of us not fully aware of the numbers in regard to classcraft concurrent users etc , Seeing some of the numbers in this thread would be great.

Second reason - I suspect with a large user base Classcraft already had many optimizations (maybe even custom ones) so even 50% increase might be more impressive as they might have already been squeezing the orange more than the average user would (and hence the average user might see far more improvement). Again there the numbers might help. I’ve been lurking for awhile in this thread and am at the point where having a clear impression of how much this makes meteor viable is critical not just for using it but using meteor.

@manthony To give you an idea, 2 years ago we needed 60 containers for around 3000 concurrent connections and had major performance issues. We’ve put a lot of effort in optimization and BEFORE the redis-oplog integration, we were running on around 60 containers for nearly 10000 concurrent connections with a decent performance.

I’m looking at today’s metrics (after redis-oplog integration) and we topped at around 5500 for 17 containers with an average pub/sub response time of ~400ms and methods response time of ~100ms.

10 Likes

Thanks Simon…much appreciated

I have looked at Grapher before and really wanted to use it but never had the time. RedisOplog is almost trivial to use because its a drop in replacement, one of its great pros, so we can start using it, and fine tune later. With Grapher its a bigger change and a chance to rethink the data model as well. Many more things to consider.

That sounds nice. 320 connections / instance is very good, especially from 166.

I withdraw my words, I’m no longer bittersweet, this is the maximum I could have hoped for. 300 is what I predicted to be the sweet spot of a Meteor instance to handle.

This is exactly what I mean, all people here know this, they know that if this works out this would mean a bang for Meteor, this is why some are angry because MDG doesn’t recognize this, but they will, I already began working on the article, I have a lot of support from many people here.

And they know that this is for people like you, who need convincing, you are one of many. Many good people love Meteor but are scared of scaling. No more.

Thanks to Classcraft my hypothesis has been proved, it scales horizontally. And even if they reach 50k concurrent users, a Meteor instance will still handle ~320 connections.

There are still some improvements to be made to redis-oplog but the gain won’t be that huge, but it will complete the vision.

Don’t want to hijack this thread, but I promise I will make a video in showing why grapher is incrementally adoptable and works with your current db models. Grapher is going to get my attention very soon, I got drawn in this because I wanted to finish what we started a year ago.

2 Likes

Anyone else using client side operations? My custom channels don’t seem to be making it to the server. I’m using Collection2 and collection-hooks, but I have moved redis-oplog to load before either of them in my packages file.

Custom channels / Namespaces don’t work with the isomorphic mutations. It would be quite dangerous to allow the client to specify in which redis channel to push data to. I would strongly suggest moving them to a method, as this is also the recommended way of doing them.

1 Like

Forgive me if I’m missing something but it seems that this could work while still being secure. I’m not suggesting the client get to just broadcast random events and wreak havoc on other users UIs, but I think that if the operation passes allow/deny and there is a publication already listening on that channel for the the mutated collection, then allowing the channel that is passed from the client should be ok.

Long story short, all of the packages in the socialize set are designed around client side operations while still being secure despite the current allow/deny mantra. Currently I’ve got redis-oplog integrated into 6 of the packages so I’ve got a heavily vested interest in channels working with isomorphic mutations. Enough that I’m willing to put a bounty on this feature if it is possible to accomplish in a secure manor.

@copleykj it is possible to do it I’m not doubting “possibility” here. But here’s my concern:

Messages.insert({message: 'xxx'}, {channel: 'threads'});

However if you use namespaces, it should be fine. Because namespaces prefix the collection. namespace: 'thread::threadId' => pushes to channel 'thread::threadId::messages'

To implement this look here:

We may need to override who is running the _validatedInsert on the collection. Most likely the method that is created when specifying .allow().

1 Like

I would need to overwrite _validated/Insert/Update/Remove to implement this correct?

Try to see if you send custom options on insert, what arguments get into _validatedInsert.

Arguments are (userId, document, generatedId)