What scale has your Meteor app achieved?

asad · April 7, 2017, 8:42pm

Hey all!

I’ve been very excited about all the new changes coming along in the Meteor community, from the client bundle being released, to the dynamic code loading coming up in 1.5, the recent speed improvements for reloads, and now Kadira APM being open-sourced! Surely these things are breathing a new live into the community!

One concern that I’ve seen repeated often, and that still remains unanswered IMO, is the scaling aspect of Meteor. Obviously there is a big role that architecture plays in this case, but putting that aside, how big have you been able to scale your Meteor apps in terms of concurrent users? I don’t mean simple todo-mvc apps, but at least moderately complex apps.

Also, if you’re scaling up a Meteor app to close to 20,000 concurrent users, what would you do to be able to get there?

fermuch · April 8, 2017, 1:59am

Hello asad,

I’m also excited about the new features coming to meteor

My team and I have a pretty big service built with meteor (with a 2+ years codebase!), which tracks phones, has a task completion system, and provides real-time stats and maps.
As we started, we used Meteor for everything. At the start, our needs were mostly basic. A panel to see devices on a map in real time and an app that sends regularly it’s position to the server.
The app was made with older versions of react-native, using DDP directly, and the user base at that time was pretty small (like 10 users online at the same time!) so there wasn’t any kind of problem there.

But as we grew, we found problems scaling out what we had. Since we were adding real-time statistics and graphs, it quickly became a memory-intensive task to recalculate everything each time a device moves (which is pretty often).
So we started caching everything with redis. And it was great. We just needed to calculate everything one time and then store it on redis. The only tricky part was to know when to expire the cache.

And then another big problem came to us: We needed to provide integrations to external tools.
An API was ideal, but building one with meteor would just add a lot more complexity to the codebase than we expected, and, also, we weren’t sure how to properly handle and scale an HTTP API inside meteor, so we started playing with Amazon Lambda.
We built the API with Python, since it was (in our opinion) easier to use for APIs than node, and we used the great framework Zappa to enable us to write a Flask (another framework) app on top of lambda.
And things were great. We had a cache for every expensive calculation, and the API was running outside of Meteor, so we could invest all of our time to improve the frontend on Meteor.

But then we faced another big trouble: with the amount of data that we were receiving, calculating graphs inside Meteor methods wasn’t enough.
But we solved it quickly moving everything to Mongo Aggregate queries, which improved the speed of calculations to 100x.
This wasn’t a problem with the tools we were using, but our lack of knowledge about the database engine we were using. (As a side note, after moving to WiredTiger some queries improved a lot, too)
And we continued to make calculations with Meteor Methods without any trouble.

And, as you might expect up to this point: we faced another problem
Oplog updates were too fast for Meteor to process, so the publications took a lot longer to be ready and huges amounts of CPU each time MergeBox detected it needed to update a publication.
For this one, we did not find a holy tool to solve the problem. We just started looking at every publication, making things we did not need to be reactive as non-reactive, and minimizing the amount of data published to the bare minimum.
We also modified the API to write to mongo in batches (insertMany instead of insert), and things improved a lot.
And, in the future, we’re planning to use redis-oplog to offload the meteor process of having to calculate every update; but right now we’re not suffering. These optimizations were enought. We just want to use redis-oplog to get rid totally of this problem without it coming back in the future. The only real problem we have is re implementing the redis-oplog’s logic on our API.

We, of course, stumbled across a lot of problems in other areas, but these are not Meteor-related. For example, we rewrote the phone app again with react-native but using the API instead of DDP because the app needed to be offline-first, and making an offline-first app is easier in a non-reactive non-pubsub environment.

If I had to start all over again, I would totally go with Apollo. Since it’s more of a frontend for APIs than a full-blown solution from database to client synchronization, improving the performance and scaling it out will be a lot easier. Also, cache all the things! Caching becomes trivial with GraphQL
But I would totally go with Meteor again. We’ve built the first mock up in just a few weeks, and today making a big change is trivial. Not only because we learnt a lot of Meteor’s internal structure, but because Meteor enables you to run instead of walking.

Understanding how Tracker works, understanding when to use afterFlush, nonreactive and other methods was crucial for me. All the magic Meteor provides made sense to me after spending some time playing with Tracker’s source code.
These days it’s not as useful as it was, since Meteor is moving to Apollo, which comes with it’s own “reactivity”, but understanding the tools you use is crucial to know how to scale and understand where bugs really are hiding.

I hope you find this answer useful. IMO scaling isn’t really about the tools being “scalable-ready”, but about understanding when and when not to use the right tools for the job, and fully understanding the pros and cons.

asad · April 8, 2017, 6:43am

Thanks for the response, and congrats on the success of your app!

A question about your API, did you use Flask with MongoDB there? How did you handle authentication?

Also, could you ballpark the number of concurrent users you’re handling through these optimizations?

Also, what’s your take on something like nimble:restivus? Is there any reason you went with Flask instead?

fermuch · April 8, 2017, 3:41pm

Thanks.

Flask with Mongo, yes. We’re using pymongo.
For authentication, we’ve split up in three ways:

One is the meteor accounts system, for the web UI
Another is an OAuth server running on top of Flask
The last one is a simple api key/api secret

Since devices aren’t users of the web UI, we just decoupled it from meteor and use OAuth with custom scopes to handle privileges of the app.
Another point was that there is already a big number of libraries/tools to use OAuth on iOS/Android.

Some automation/integration tasks needed a simpler way of authentication, so we have a panel where you can create api keys / api secret for resources and use it to automate some things, for example: creating tasks or adding webhooks.

I’m not sure if I can share exactly how much users we have, but I can tell you we’re close to have the number you’re interested in of concurrent users. And we can double it without much trouble.

When we started, I watched very closely restivus. It’s a great tool, but some of our API tasks are CPU heavy, and we wanted to decouple it of the meteor thread.
Using Lambda, we don’t really care if we have 30 req/s or 30 million req/s. It just works. The only thing we have to pay attention is the number of connections made to mongodb, and the percent of read/write operations on the DB (but, again, we offload most of it to redis so we can do it in batches).

There are also some mongo MapReduce tasks we have, which run periodically, and lambda is great for that kind of thing. It has a scheduler (“every 15 min”, for example, is a valid schedule) and it processes data in batches, without us even paying attention to Ops.

We could of course use something else. I think we started with AgendaJobs, but it was difficult to know if a job ended successfully or not, and several other problems related. We tried Lambda, and it works for us.

asad · April 8, 2017, 7:12pm

You also said that you have mobile apps. Why did you decide to go with React Native instead of Meteor for that? And any reason you built a REST API for the mobile app instead of using something like the Asteroid library?

fermuch · April 9, 2017, 12:07am

React native is native. There is a huge difference if you render a lot of items on screen with RN and cordova. Since we were going to use react anyways, RN was a perfect fit for us.
We were aware of asteroid and react-native-meteor, but Meteor is PubSub (you subscribe to a collection, then you receive notifications when the collection updates). We needed the app to work completely offline for 7-days, since that’s the time some workers don’t have internet connection on the phone, and having to reconcile the state between the app and the Meteor server after so much time isn’t easy. Ground:DB is a great package for offline apps, and I’ve used it a few times in different projects, but sooner or later you’ll find some trouble handling conflicts on updates.