Proposal for Scaling Meteor without Oplog Tailing

msavin · May 7, 2015, 6:50pm

After listening to the Galaxy talk between Justin and Arunoda, my excitement for Meteor’s vision is greater than ever. I think it can be the first platform to successfully abstract away the scaling challenges a start-up may face. I think it’s going to be revolutionary, but until that day comes, I have some pressing concerns about scaling Meteor. Primarily, it relates to Livequery - Meteor’s way to making MongoDB real-time by watching every operation in the database log.

For those unfamiliar with this, each Meteor server has to process every operation that happens in the database. This consumes a lot of resources and creates a performance ceiling that cannot be overcome by adding more servers. Since there are many Node/MongoDB hosting options, I was hoping Galaxy would tackle this first, but it doesn’t look like that will happen in the first version.

I’ve been thinking of how this can be overcome at a low cost. Instead of watching the MongoDB operations log, what if Meteor simply assumed that every write was successful, and pushed that action to all the app servers?

The only tradeoff I see for this approach is if MongoDB begins to fails at writes, in which case the connected clients will get the data but it won’t persist. Depending on how you look at it, it can be a good or bad thing. On the plus side, there may be less latency with this approach and it will probably get your further.

By embedding this into the Meteor app instead of plugging into MongoDB’s oplog, we can also have an easy on/off switch for which database collections are being watched and how.

Posts.observe({  
  insert: function () {
    return true;
  },
  update: function () {
    return true;
  },
  remove: function () {
    return true;
  }
});

In the long term, I would love to see Meteor unbundle Livequery into a seperate service and have it work the way it does now. It turns Meteor into a great integration point for other services. Maybe Galaxy can have an on/off switch for watching MongoDB’s oplog or to assume success, and this API would play right into it.

In the short term, I think this can be a good, non-premium solution, that can help people take Meteor much further with where it is today. This can also lead to a more balanced approach to designing applications with Meteor, where not every bit of the application needs to be real-time.

cottz · May 7, 2015, 7:44pm

+1
I would love to do this to also offer the user the option ‘undo’ for a few seconds

msavin · May 7, 2015, 7:47pm

@cottz word - you can put like a X second delay before pushing a change to ensure the user intended it. That way, you can let users unlike that photo before things get awkward!

joshowens · May 7, 2015, 9:47pm

I think if you feel this is a good idea, just give it a try as a package. Livequery is just a package, Minimongo is just a package.

I am not sure this solution feels practical in the long term, I think there can be downsides to assuming a write happened. Your solution would quickly turn complex once you start to layer in things like Allow/Deny. What happens if a Meteor method is doing the writing to the DB instead? Or are you planning to hook in even lower to the raw Mongo connector to pick up writes?

@arunoda had some similar work early on with Redis and use it as the communication point between servers for real-time messaging.

I think the long term answer will probably be something like Postgres or RethinkDB, where you can get triggers or native db pub/sub (respectively).

tmeasday · May 8, 2015, 7:26am

@joshowens I think if you read the livequery project page closely and read between the lines I think the MDG team agrees with you

msavin · May 8, 2015, 4:26pm

Yes, that’s what Galaxy is supposed to do… from what I’ve read it sounds like they are planning to build LiveQuery for all kinds of databases (Postgres, MySQL, etc)… but until Galaxy comes, and until it gains support for it, this can be an equally effective solution to prevent the bottleneck. It’s also good for the eco-system because it would prevent us from being locked into Galaxy to scale Meteor. Althought, I am sure Galaxy will be so awesome that we will want to stay there.

In terms of Allow/Deny…those things just write methods that perform the actions when your application gets built.

The approach I’m suggest would have to hook into the raw Mongo connector. Whenever a write/insert/update happens on the server, the Mongo package would check if that action is being observed. If its being observed, it would push that change to the other connected clients, and they would treat it like an oplog update.

The end result: the effect of oplog tailing, the ability to select which collections you want to observe, less latency since you don’t have to go to the database server, and it would scale further.

hypno · October 28, 2015, 12:56pm

Change detection should be re-engineered to be horizontally scalable. Atm you can only go so far with meteor in terms of scale.

As of current meteor state after x amount of write operations (that you actually need reactivity for) you basically have to switch framework. You can make it cope with more operations using msavin proposition but there will still be a cap. Basically you can not build reactive meteor app with millions of users that are generating write operations that others need to see reactively, no matter how many servers you have. We are getting very close to this cap in our production project now. We have done numerous optimisations and hacks to loosen the oplog stress and it only increases the cap. It is getting very scary for us.

There needs to be some kind of change detection hub/gateway that sits between database and apps that takes this load. It should know exactly what changes each app cares about and possibly aggregate the changes in case of massive volatility in same document properties. And of course it should be horizontally scalable.

babrahams · October 29, 2015, 12:52am

I’m suprised people don’t talk about this more. @khamoud made a telling comment a few months back saying that after a while horizontal scaling doesn’t work for apps with a heavy write load and real-time observers – you just end up adding more servers to crash.

One interesting real-world solution is Kadira’s, where the massive volume of writes is being made is to a separate database and data is aggregated from there before being fed to the real-time observers (via a different mongo instance ?? – @arunoda has described the set-up in a blog post, but it’s been a while since I read it). But Kadira’s stack definitely represents a work-around for the scaling limitations of the basic Meteor stack, while still making use of the parts of Meteor we love.

I wonder if the Galaxy team has a plan to tackle the cap that @hypno is talking about. As long as servers have to maintain state, this seems like a hell of a challenge to scaling.

jacobin · October 29, 2015, 1:36am

In a recent Meteor video, Hansoft were discussing how they encountered (and solved) O(n^2) issues in meteor core that limited them to 100 users before things erupted into flames. I wish they had spent the entire talk on that topic alone.

Being new to meteor it’s a frightening thought that these issue not only exist, but are an unknown as they are rarely discussed. Somewhat difficult to make application design decisions, let alone infrastructure ones.

efrancis · October 29, 2015, 6:27pm

they claim the server was crashing around 100 users and their change of organizing subscriptions into “channels” made it scale up to over 1,000 per server. that’s a pretty significant increase, I’d think this would be talked about more. they said they were talking with Meteor about getting it into core, this was posted two months ago and I haven’t heard anything about. it’d be nice to hear an update, since horizontal scaling is at this point one of my biggest gripes about Meteor

slava · October 29, 2015, 6:34pm

Granted, Hansoft developers contribute back to the Meteor Core solving issues they spotted: https://github.com/meteor/meteor/pull/4694

jacobin · October 29, 2015, 7:38pm

It looks like the commit bears directly on the discussion in the video. However the confusion I have is in the talk being from August 27th yet the link indicates their fix was merged on July 7th.

Perhaps someone in the know can summarize outstanding performance issues with Meteor affecting vertical scalability. From reading the Livequery documentation, it somewhat hints that it’s a ‘work in progress’ and there is still work to be done.

msavin · October 29, 2015, 7:40pm

Indeed. My worry is the ‘work to be done’ will be proprietary

jacobin · October 29, 2015, 7:47pm

A little birdie told me Hansoft’s ultimate solution did involve a custom Redis backend. I’m more interested in meteor and vertical scaling sans any jury rigging.

SkinnyGeek1010 · October 29, 2015, 8:49pm

Interesting. It seems like once RethinkDB is supported we won’t have to resort to hacks like oplog tailing and Redis to scale Meteor? I hoping that will alleviate some pressure on Meteor.

As a community it also might be worth looking into how Phoenix multiplexes channels to see if we can gain some insight for Meteor. Granted Phoenix is running on the Erlang VM and Node will never scale to millions of connections per server but perhaps we can grok something?

http://www.phoenixframework.org/docs/channels

seba · October 29, 2015, 9:28pm

Well, somewhere in the stack some component will have to process updates to the database anyway. Putting it in the DB itself will probably be more efficient than Meteor’s oplog tailing, but there’s no such thing as infinite scaling.

jacobin · October 30, 2015, 2:21am

Not millions for sure, but thousands per node should not be out of the question. However http://vertx.io/ more than Pheonix likely has ideas you can apply. It’s more conceptually congruent, being a polygot node implementation on the jvm.

There is no infinite scaling true, but I would be excited to see Meteor scale up to the point where SkinnyGeek1010 laments with an angry fist the day he decided to like the idea of immutable state stores.

hypno · November 4, 2015, 11:55am

I agree that letting the DB handle this would be much better, like RethinkDB for example.

bitomule · November 9, 2015, 10:22am

Any update on this?. We’re worried our production app could reach limits soon and moving to galaxy is not an option (no eu aws support yet).

How are big meteor apps managing this limit?

msavin · November 9, 2015, 1:52pm

Separate databases (ie. only hold ready, consumable data in the MongoDB that your Meteor app connects to). The other thing suggested by @debergalis is to only observe certain collections and put some kind of oplog filter in between your Meteor app and database. That could make for a good project.