Proposal for Scaling Meteor without Oplog Tailing

joshowens · May 7, 2015, 9:47pm

I think if you feel this is a good idea, just give it a try as a package. Livequery is just a package, Minimongo is just a package.

I am not sure this solution feels practical in the long term, I think there can be downsides to assuming a write happened. Your solution would quickly turn complex once you start to layer in things like Allow/Deny. What happens if a Meteor method is doing the writing to the DB instead? Or are you planning to hook in even lower to the raw Mongo connector to pick up writes?

@arunoda had some similar work early on with Redis and use it as the communication point between servers for real-time messaging.

I think the long term answer will probably be something like Postgres or RethinkDB, where you can get triggers or native db pub/sub (respectively).

tmeasday · May 8, 2015, 7:26am

@joshowens I think if you read the livequery project page closely and read between the lines I think the MDG team agrees with you

msavin · May 8, 2015, 4:26pm

Yes, that’s what Galaxy is supposed to do… from what I’ve read it sounds like they are planning to build LiveQuery for all kinds of databases (Postgres, MySQL, etc)… but until Galaxy comes, and until it gains support for it, this can be an equally effective solution to prevent the bottleneck. It’s also good for the eco-system because it would prevent us from being locked into Galaxy to scale Meteor. Althought, I am sure Galaxy will be so awesome that we will want to stay there.

In terms of Allow/Deny…those things just write methods that perform the actions when your application gets built.

The approach I’m suggest would have to hook into the raw Mongo connector. Whenever a write/insert/update happens on the server, the Mongo package would check if that action is being observed. If its being observed, it would push that change to the other connected clients, and they would treat it like an oplog update.

The end result: the effect of oplog tailing, the ability to select which collections you want to observe, less latency since you don’t have to go to the database server, and it would scale further.

hypno · October 28, 2015, 12:56pm

Change detection should be re-engineered to be horizontally scalable. Atm you can only go so far with meteor in terms of scale.

As of current meteor state after x amount of write operations (that you actually need reactivity for) you basically have to switch framework. You can make it cope with more operations using msavin proposition but there will still be a cap. Basically you can not build reactive meteor app with millions of users that are generating write operations that others need to see reactively, no matter how many servers you have. We are getting very close to this cap in our production project now. We have done numerous optimisations and hacks to loosen the oplog stress and it only increases the cap. It is getting very scary for us.

There needs to be some kind of change detection hub/gateway that sits between database and apps that takes this load. It should know exactly what changes each app cares about and possibly aggregate the changes in case of massive volatility in same document properties. And of course it should be horizontally scalable.

babrahams · October 29, 2015, 12:52am

I’m suprised people don’t talk about this more. @khamoud made a telling comment a few months back saying that after a while horizontal scaling doesn’t work for apps with a heavy write load and real-time observers – you just end up adding more servers to crash.

One interesting real-world solution is Kadira’s, where the massive volume of writes is being made is to a separate database and data is aggregated from there before being fed to the real-time observers (via a different mongo instance ?? – @arunoda has described the set-up in a blog post, but it’s been a while since I read it). But Kadira’s stack definitely represents a work-around for the scaling limitations of the basic Meteor stack, while still making use of the parts of Meteor we love.

I wonder if the Galaxy team has a plan to tackle the cap that @hypno is talking about. As long as servers have to maintain state, this seems like a hell of a challenge to scaling.

jacobin · October 29, 2015, 1:36am

In a recent Meteor video, Hansoft were discussing how they encountered (and solved) O(n^2) issues in meteor core that limited them to 100 users before things erupted into flames. I wish they had spent the entire talk on that topic alone.

Being new to meteor it’s a frightening thought that these issue not only exist, but are an unknown as they are rarely discussed. Somewhat difficult to make application design decisions, let alone infrastructure ones.

efrancis · October 29, 2015, 6:27pm

they claim the server was crashing around 100 users and their change of organizing subscriptions into “channels” made it scale up to over 1,000 per server. that’s a pretty significant increase, I’d think this would be talked about more. they said they were talking with Meteor about getting it into core, this was posted two months ago and I haven’t heard anything about. it’d be nice to hear an update, since horizontal scaling is at this point one of my biggest gripes about Meteor

slava · October 29, 2015, 6:34pm

Granted, Hansoft developers contribute back to the Meteor Core solving issues they spotted: https://github.com/meteor/meteor/pull/4694

jacobin · October 29, 2015, 7:38pm

It looks like the commit bears directly on the discussion in the video. However the confusion I have is in the talk being from August 27th yet the link indicates their fix was merged on July 7th.

Perhaps someone in the know can summarize outstanding performance issues with Meteor affecting vertical scalability. From reading the Livequery documentation, it somewhat hints that it’s a ‘work in progress’ and there is still work to be done.

msavin · October 29, 2015, 7:40pm

Indeed. My worry is the ‘work to be done’ will be proprietary

jacobin · October 29, 2015, 7:47pm

A little birdie told me Hansoft’s ultimate solution did involve a custom Redis backend. I’m more interested in meteor and vertical scaling sans any jury rigging.

SkinnyGeek1010 · October 29, 2015, 8:49pm

Interesting. It seems like once RethinkDB is supported we won’t have to resort to hacks like oplog tailing and Redis to scale Meteor? I hoping that will alleviate some pressure on Meteor.

As a community it also might be worth looking into how Phoenix multiplexes channels to see if we can gain some insight for Meteor. Granted Phoenix is running on the Erlang VM and Node will never scale to millions of connections per server but perhaps we can grok something?

http://www.phoenixframework.org/docs/channels

seba · October 29, 2015, 9:28pm

Well, somewhere in the stack some component will have to process updates to the database anyway. Putting it in the DB itself will probably be more efficient than Meteor’s oplog tailing, but there’s no such thing as infinite scaling.

jacobin · October 30, 2015, 2:21am

Not millions for sure, but thousands per node should not be out of the question. However http://vertx.io/ more than Pheonix likely has ideas you can apply. It’s more conceptually congruent, being a polygot node implementation on the jvm.

There is no infinite scaling true, but I would be excited to see Meteor scale up to the point where SkinnyGeek1010 laments with an angry fist the day he decided to like the idea of immutable state stores.

hypno · November 4, 2015, 11:55am

I agree that letting the DB handle this would be much better, like RethinkDB for example.

bitomule · November 9, 2015, 10:22am

Any update on this?. We’re worried our production app could reach limits soon and moving to galaxy is not an option (no eu aws support yet).

How are big meteor apps managing this limit?

msavin · November 9, 2015, 1:52pm

Separate databases (ie. only hold ready, consumable data in the MongoDB that your Meteor app connects to). The other thing suggested by @debergalis is to only observe certain collections and put some kind of oplog filter in between your Meteor app and database. That could make for a good project.

niallobrien · November 9, 2015, 1:57pm

This is a concern I share, however assuming all writes succeed is not the answer in my opinion. Personally, RethinkDB seems like the best solution on the horizon.

SkinnyGeek1010 · November 9, 2015, 8:24pm

One possible solution is to swap out the subscriptions for a meteor method if it doesn’t have to be instant real time. Long polling can be cheaper in some cases. This worked out fairly well for me but it depends on the project of course. Typically the Meteor method would do the same sort/query that the template helper would do in Minimongo. In this case if there was new data you would just use the entire response as is.

jacobin · November 9, 2015, 9:16pm

Postgres, RethinkDB and Redis all have some sort of pub/sub built in.

Stack Overflow uses a mere 9 web servers to support 260k+ concurrent users with tons of reactive data, all being fed from Redis’ pub/sub. Their biggest issue is running out of sockets at the os level.

It makes the silly Blaze thread quite mind boggling. What exactly is the point of fretting over which front-end to stick onto a back-end that doesn’t really work once it is actually asked to work?

@zimme says that Hansoft really want to give their solution back to the community. Hopefully we’ll see it one day. Much as @msavin suggested, it involved inserting an intermediary between Meteor and the database.

For sure many things wouldn’t be noticed if it were not sub millisecond. It seems though with meteor everything by default is assumed to be critically reactive. I see people in irc and the forums using mongodb and pub/sub for chat messages.

Perhaps that’s a function of the docs and the way they present things.