Proposal for Scaling Meteor without Oplog Tailing

jacobin · November 10, 2015, 7:17pm

As long as your requests are totally stateless (REST) you can scale node like crazy. Any particular node doesn’t know that any other nodes even exist. The problem with oplog tailing is that every node is de-facto observing the total state. You end up saturating cpu really quick when there is a write.

Unfortunately in the video he doesn’t mention the limit they reached when switching from socket.io to socks.js and Redis pub/sub. The final solution in the video was a high performance db proxy written in c. I’m not sure what Hansoft’s db proxy is written in, but he did harp about how most of them are “C++ Gods”

Zimmy was however indicating that the Hansoft changes were working fine at up to 10k concurrent connections per meteor node. While that’s nowhere close to running out of ports or file descriptors I’d be still happy with that!

https://vimeo.com/113703576

Here’s a talk on how StackOverflow uses Redis (mostly live coding!). He compares SQL with Redis at one point with SQL handling 9k ops while Redis churned out 19k ops.

Redis is fast. All in memory and literally hand tuned to take into account things like chip and instruction cache and whatnot. The eventual limits of Postgres or RethinkDB’s pub/sub I have no information on, but it won’t be close to Redis.

The guy in the video sure answered your question. The answer was… err no! At least not with socket.io.

skini26 · November 10, 2015, 8:12pm

But it will be at least better than tailing the oplog or subscribing to Postgres or Rethink right ?
We’re not talking about 500M but just something better than the actual way we do things.
I’m interested by this way because it would be easy to do it and it uses Redis as it should be used.

jacobin · November 10, 2015, 9:59pm

Certainly anything is better than the dark O(n^2) hole that is Meteor oplog tailing. Even sans some sort of DB proxy, using the pub/sub mechanisms built into Redis/Rethink/Postgres would be a vast improvement.

However, using socket.io to replicate state across nodes I wouldn’t exactly put into the ‘anything’ category as it causes the exact same problem. Look what happens to your diagram when you add a 3rd or 4th node. Now extrapolate that to say 10 nodes…

Even Roto Rooter couldn’t unclog node’s event loop at that point. It will look exactly like the diagram in the Hansoft video.

ccorcos · November 12, 2015, 5:17am

So here’s my question:

How does a giant service like iMessage work?

A solution like oplog tailing doesnt work because there are so many messages being sent every second that your servers couldnt possibly keep up with the oplog.

After talking with a few people, I believe the solution for a chat service like this is to have a database to look up what server each user is connected to. That way, when you want to send a message to someone, you can send it to the specific server that serves that user rather than spam the entire cluster like the oplog does.

So thats interesting. But it gets more complicated when you take a more challenging scenario like this:

How does the like count on a Facebook post get updated reactively?

Along the same lines as the last solution, perhaps theres a database that keeps track of every post that every user is looking at on Facebook. Then when you like or unlike a post, you can see what servers are hosting the users that are currently looking at that post and send the message off. This is interesting, but I couldnt image Facebook is keeping track of every post that every user is looking at. That just seems insane! Not to mention, there would likely be a lot of race conditions that could cause the like count to get out of sync. So how does this scenario work?

I’m a huge fan of the idea that the database would just take care of all this, but I’m not sure thats realistic in the near-term.

SkinnyGeek1010 · November 12, 2015, 2:55pm

How does a giant service like iMessage work?

iMessage doesn’t disclose much but WhatsApp does. From what I remember, if you break it down they’re only sending the data once when the user connects, and after that they’re just sending messages and the app will store the new ones.

They can hold open the connections easier because that part is on erlang which is basically made to hold open lots of connections (built for phone switches)… though Phoenix has proven that you can also run around 2 million websocket connects on a single box before hitting OS limits (it’s also using the erlang vm in a coffeescript sort of way).

I think this video explains more:
https://www.youtube.com/watch?v=c12cYAUTXXs

bitomule · November 12, 2015, 3:26pm

Thanks! In our case we can’t separate data to a different db but I’m planning on moving all the chat to a different db.

bitomule · November 12, 2015, 3:30pm

We’ll try to move what we can to methods vs subscriptions but not sure there’s anything we can move without killing user experience. Hope we can find something better or MDG publishes a fix if there’s any.

SkinnyGeek1010 · November 12, 2015, 6:13pm

We’ll try to move what we can to methods vs subscriptions but not sure there’s anything we can move without killing user experience. Hope we can find something better or MDG publishes a fix if there’s any.

I’m working on an experiment that should be ready this weekend (with screencasts of load testing on blitz.io). The basic gist is that you can opt some publications to use this without having all of them use it.

Meteor just passes data through (from a secondary app on the same server) so Meteor’s limit becomes the number of websockets it can hold open per instance. You’ll also be able to use the publication/subscription the same as now (though i’m thinking ditching this could offer large gains) I’m aiming at getting a chatroom example app to have several thousand concurrent users on a small $10 digital ocean box.

It should be very easy with a small helper package to glue the wiring together… and it will be open-sourced of course!

Does anyone know of a rough writes per second limit on the oplog getting clogged?

Thanks! In our case we can’t separate data to a different db but I’m planning on moving all the chat to a different db

@bitomule do you have a certain DB in mind? in my tests i’ll be using Mongo for one and RethinkDB for the other (the latter simplifies things greatly, and the new 2.2 release adds atomic changefeeds and should be released in a few hours).

bitomule · November 12, 2015, 7:03pm

I’m not sure if we’ll move the chat to another meteor + mongo, not on planning yet. The issue we have is that our app is already built using publish composite so we need something that fits in that and doesn’t require changing all the app.

SkinnyGeek1010 · November 12, 2015, 7:17pm

Does the chat data need publish composite? or was the other app data? So if my experiment is performant enough you could still use Mongo and publishing multiple collections. We’ll see how that goes though!

edit, ah yea I think I understand now… if you moved chat to a different DB the rest of the app couldn’t do a composite join with the chat data?

bitomule · November 12, 2015, 7:54pm

Chat could be separated in a different db or use what you’re working on when you release it Not sure how it will work with our app. If we don’t use a different db how would it fix the oplog tailing issue?

SkinnyGeek1010 · November 12, 2015, 8:10pm

By not using it

There will be lots of ways to skin it, from least amount of work to most performance. I’ll be able to give you a better idea once I can try out different combos.

The easiest solution should just work with a simple adapter in the publication.

ccorcos · November 12, 2015, 9:19pm

Yeah, thats pretty intense stuff. Idk half the stuff he’s talking about though! I’d really like a one-size-fits-all scaling solution though

Specifically, I want to be able to scale a service like facebook or instagram – a follower network.

jacobin · November 12, 2015, 10:26pm

Interesting video. Note that they mentioned in the video there is very little traffic between nodes. The secret sauce appears to really be Mnesia, which looks very much like a built-in Redis but with a fixed schema instead of a key-value store.

When trying to scale horizontally something that’s key to remember is Amdahl’s Law.

Think of it like a restaurant with a single cook. No matter how many waitresses you add the system will only be as fast as the cook.

With stateless rest each waitress essentially has her own cook. When you need to share state across nodes, whether your backplane to share state is configured as a database or Redis memcache that will always be your limiting factor no matter how many nodes you add.

The good thing is you can scale Redis sky high. If you can shard your data you can scale it even higher! The best practise always is splitting your application into functional groups/sets and scaling them individually. That’s why microservices work so well in Rest land.

In your chat example that might be sharding by particular rooms. In Hansoft’s solution, that would be grouping writes by tenant.

In both examples the hard work is moved out of the event loop onto a separate application. Rule #1 in node is to not block the event loop. Meteor is not only violating that rule but calling it a feature. In MDG’s restaurant the waitress is also the cook. Good luck getting that burger during lunch rush!

You want to look into a graph database like Neo4j.

skini26 · November 16, 2015, 6:23pm

Would it be possible to create some package (or maybe in the core) that will make a subscription works in 2 different modes:

1 were the server parses the oplog and sends the data.
Another one were the client polls every X minutes.

The second one would be used for complex queries with lot of parameters were the oplog parsing isn’t efficient.

This would be very easy to do without a package too :

Template.X.onCreated(function () {
      this.pollingIntervalId = Meteor.setInterval(function() { 
               Meteor.call('getCustomData', function (err, data) {
                      if (!err) LocalCollection.upsert({_id: data._id}, data);    
               } 
      }, 5000);
});

Template.X.onDestroyed(function () {
     Meteor.clearInterval(this.pollingIntervalId);
});

SkinnyGeek1010 · November 17, 2015, 2:53pm

Does anyone know if there’s a way to cache the HTML page going down to the client? (basically the view source version). This is becoming a bottleneck in my setup. With load testing a 2GB box can only handle ~400 concurrent users before it starts spewing errors. This is without executing JS (need a better load tester). I can get to ~800 users but at that point it’s throwing hundreds of errors.

I’m considering moving the initial HTML to a CDN (Cordova style connection), perhaps that would be the easiest at scale.

[quote="skini26, post:44, topic:4017"] Would it be possible to create some package (or maybe in the core) that will make a subscription works in 2 different modes: [/quote]

Yep, i’ve done something similar to #2 with pretty good success. The only issue is this is essentially long polling with ajax. Gets the job done though!

fabiodr · November 18, 2015, 5:42am

Yes, you can have a query with oplog disabled and return it to the subscription, like described here:

Meteor.publish("comments", function (postId) {
  return Comments.find({post: postId}, {_disableOplog: true, fields: {secret: 0}});
});

SkinnyGeek1010 · November 18, 2015, 1:49pm

Just wanted to recap some info here, You can actually serve the meteor page over a CDN with some config, I think joshowens is writing a blog post about this! Also he suggested using mupx with page caching turned on as an alternative.

efrancis · November 18, 2015, 5:22pm

I can’t find any docs on mup/mupx “page caching” feature, what’s it do?

SkinnyGeek1010 · November 18, 2015, 5:28pm

I think it’s specific to nginx:

https://www.nginx.com/resources/wiki/start/topics/examples/reverseproxycachingexample/