Scaling MeteorJS > 7,000 concurrent connections

Thanks @alawi and @captainn ! We’ve started using https://github.com/adtribute/pub-sub-lite pub sub lite (as a way of reducing the usage of proper pub sub) and seeing some small wins there now. We’re already using redis oplog, so it looks like the next step for us is to migrate as many pub-subs to pub-sub-lite.

Still looking for other ways to manage performance.

3 Likes

There are more drastic things you can do too, like switching over to methods for data, then using something like simple:rest to convert your DDP requests to REST requests. Once you do that, you can use cloudfront or similar to cache those requests, and reduce the pressure on your node.js server.

Edit: maestroqadev:pub-sub-lite is a great find! I’m going to start using that immediately for one of my hobby projects!

2 Likes

That’s interesting. I always thought redis oplog would be the mystery weapon to solve all Meteor scaling problems?

BTW: Cool concept. I just enrolled to the NSW startup class, maybe I can learn something :slight_smile:

2 Likes

I think redis-oplog still consumes memory from mergeBox.

There is no bulletproof solution to large scale, memory and CPU will be used at some point, unless the entire infrastructure is outsourced.

With that said, I think they need to convert the unnecessary pub/sub with pub-sub-lite to methods, which looks like a good start.

@pasharayan it’s hard to give any specific advice without knowing how your app is designed & the choices you’ve made for the processing work your app needs to do. I can give general advice for things that would help everyone’s app. You’ve probably done many of these already, but I’m just listing them for everyone’s benefit and the chance that you might find some helpful.

Here is a list of some things that put load on your server cluster:

  • JS bundle sending on new client connections & client refreshes (improve by adding a ServiceWorker.js file to cache your JS assets, 1 hr of effort)

  • Heavy data over pub/sub (DDP) (make sure you are examining the data moving over pub/sub with Meteor APM, try to root cause the biggest resource drains to optimize your app, a typical Meteor app without issues can handle many more concurrent connections)

  • Multi-user update loops, data change by 1 user causes 7000+ other users to get the change (alter your internal app design if you have this type of thing going on, just be aware of how data & updates propagate through your app, map this out in a visual tool to really make sure you know what is going on)

  • Using your Server to do CDN type workloads, like hosting images & video files (make sure these bandwidth heavy tasks like image hosting, video hosting, etc. are moved to a CDN of choice)

  • Over-using the server when you could put some user specific processing on the client alone (the client’s browser can handle a lot of processing that you might consider doing on the server, make sure you balance the workload)

  • MongoDB queries where too much processing is done on the Server (remember MongoDB has many advanced query types where processing loads can be handled by your MongoDB cluster, make good design choices & optimize where you can)

  • Not using enough Async/Await code on the Server (make sure you don’t have functions waiting for other functions that are holding up your server processing, optimize this in your app)

  • Wasteful processing from too many timers driven by client actions (don’t use timing delay functions in your code if you have many clients, this is usually also fixed with proper use of async/await in your app)

These are just some things that come to mind. Little things that would not be noticable with small numbers of users add up to cause issues. Make sure you find the root causes of each issue and just work on them 1 at a time to get your performance gains up.

Meteor with DDP pub/sub is very scalable if it is used thoughtfully.

In my app, I only use DPP pub/sub where I need the features and I use async/await Meteor Methods, which use DDP, but are being run only when specific data is needed. I think this is an ideal approach.

I’m happy to elaborate if you have any specific questions that come to mind. Could you tell us a bit more about what you are using in your Meteor Stack? Blaze, React, Vue, etc.? What your app does when concurrent users are connected? etc.

12 Likes

" JS bundle sending on new client connections & client refreshes (improve by adding a ServiceWorker.js file to cache your JS assets, 1 hr of effort)" - or just deliver from a CDN:

In your Meteor startup/server

if (Meteor.isProduction) {
  WebAppInternals.setBundledJsCssUrlRewriteHook(url => {
    return `https://your_cloudfront_cdn.com${url}&app_v_=${process.env.npm_package_version}`
  })
}

If you go this way, just mention my name so I can give you the Cloudfront configuration.

3 Likes

About your concurrent connections, are you disconnecting them from the server after being idle for X amount of time?

5 Likes

Hi @pasharayan, I’m the author of pub-sub-lite. It’s a new package so I’m very happy to see early adopters! Thanks for trying out the package and feel free to let me know if you encounter any issue.

Regarding your performance problem:

  • I would like to echo what have been mentioned by others here about reducing the use of pub/sub in favor of Methods. It seems that you’ve already started going on this path with pub-sub-lite, which is great.

  • Are you currently sending a large amount of data to each client? If so, is it possible to reduce the size of that data (e.g. filtering only the necessary document fields, doing pagination to reduce data on initial load)?

  • You mentioned that you app has been “becoming unusable”. Does the app feel slow and laggy on the client-side? Although most of Meteor performance problems occur on the server, you may face them on the client as well. For example, if the client has to process too much data, the UI may become unresponsive. One more thing to look out for is that if you store a large amount of data in Minimongo, the performance may suffer because Minimongo doesn’t maintain any index. It can be more performant to just store your documents in a normal array and use native JavaScript Array methods (find, filter,…) to access them (obviously you’ll lose the benefit of reactive rendering, so this should only be seen as a workaround for edge cases when the amount of data is too large).

  • Did you notice anything unusual in your Meteor APM? It would be helpful if you can share your APM screenshots with us.

6 Likes

Hi, on Galaxy we don’t provide code level support but besides that we try to help as much as possible, even providing insights at the code level and this has been the case also in your recent tickets.

About your issues, are you comparing your connection metric with Google Analytics or other tool? Maybe you are keeping many live queries for idle clients, a package like mixmax:smart-disconnect could help.

As you are already using redis-log, you could also use redis-oplog fine tuning options https://github.com/cult-of-coders/redis-oplog/blob/master/docs/finetuning.md but as others have said it’s hard to provide specific feedback without knowing your code.

8 Likes

Not knowing a whole lot about your application these would be my personal recommendations.

  1. Implement cultofcoders:grapher making use of non-reactive queries where possible to benefit from the performance of their Hypernova engine.

  2. Remove all publishing of reactive counters and replace with denormalized counts, grapher can help with this as well.

  3. As stated by @filipenevola, fine tune redis-oplog by implementing custom channels.

3 Likes

I love the concept of your pub-sub-lite package!

3 Likes

very good advises by @filipenevola and @copleykj
I would also add:

  1. Analyze your db queries: create appropriate indexes, use projections, try to use projections to execute covered queries, read this article https://docs.mongodb.com/manual/core/query-optimization/#covered-query
  2. Use load testing tools with APM to understand why it’s unresponsive
    APM tools (on local):
    https://github.com/Meteor-Community-Packages/meteor-elastic-apm
    monti APM: https://montiapm.com/
    Load testing:
    https://github.com/kschingiz/artillery-engine-meteor
    There was also one load testing tool, but I cannot find it
  3. I have also seen cases where frontend was doing lots of re-subscribes/method calls on each data change, use Meteor dev tools: https://chrome.google.com/webstore/detail/meteor-devtools/ippapidnnboiophakmmhkdlchoccbgje?hl=en
    to see why and where you are refetching data
  4. There are also cases when oplog cannot be used in pub/sub, so meteor uses PollingDriver which is very slow, MontiAPM which is based on Kadira will show that pub/subs

Good luck with optimization, I believe Meteor can handle even more connections than 7000+.

6 Likes

I’ve had performance problems on Galaxy which Galaxy Support never adequately addressed. Switched to NodeChef and problems were solved. BTW, lots of good performance optimization suggestions in this thread (many of which I had tried to no avail). NodeChef isn’t problem-free though either as I’ve experienced outages on my NodeChef hosted apps. However, when they run, they run well.

1 Like

Our load tests showed about ~300 concurrent connections per server, before load times skyrocket. Usually minimum size containers are used, so that it is no more than 50% RAM used in ‘idle’ state (in our case it is 512MB RAM containers, but normally 256MB containers are enough for simple application). This is without usage of Redis Oplog (which we want to try soon). This however strangely matches to your 7000 connections per 20 servers (7000 / 20 = 350 connections per server). Now I am interested if this is the maximum physical cap here? Or if larger servers may help?

1 Like

Hi @cormip how are you doing? I believe you are talking about past events (before Tiny acquisition), right? I would be happy to review the issues that you had with Galaxy and offer the trial for you to check Galaxy again.

We have thousands of Meteor apps running on Galaxy, handling thousands of connections without issues.

Please reach me out on filipe@meteor.com or support@meteor.com so I can understand your issues. If they are still happening that is even better so we can improve Galaxy even further :smiley:

3 Likes

I have talked with @pasharayan by voice and he was using a different channel to communicate with Galaxy team and then even simple requests, like increase container limits, were not being received, he was not able to remember what was the channel specifically but I assume it was not the current valid ones. We did a test together sending requests using Meteor website and it’s working as expect, from now on I don’t believe he is going to have these issues anymore, it was a problem in the channel used to reach us and not with our support. And to be clear, the best channel is to send an email to support@meteor.com

But, in the same time, our support was replying many messages from Julius (Insidesherpa CTO) but I understand that when things were burning at their company and then maybe Pasha was not aware of that.

I just want to reinforce here that Galaxy is a very important piece of the Meteor ecosystem and we (Tiny) are doing our best to provide the best experience possible. We have received other the last 9 months a lot of great feedback about our support and service.

I know we have things to improve, we always have, but I’m sure we are providing a very good service here. And, if you are a Galaxy as well and you are not happy, please, send me an email filipe@meteor.com

6 Likes

Hi, we don’t have a maximum cap, we have clients running much more than 7000 connections. That is our fault that we don’t have a scaling guide and also study cases, we are working on this.

If you can handle 300 connections with 512MB without using redis-oplog be sure that you will be able to handle much more with redis-oplog.

Redis-oplog is the best way to use Subscriptions and achieve horizontal linear scale with Meteor as it helps you to spread the load of receiving/processing your real-time update messages equally or at least better. MongoDB oplog will require every container to read all the messages and that is why @diaconutheodor and his team came up with this great package, to workaround this issue and allow the messages to be sent to the necessary containers only.

You can improve this even further using redis namespaces then you can fine tune your messages. With this setup you can scale Meteor subscriptions a lot, I don’t see limits here.

A few important concepts about scaling Meteor apps and Meteor apps in general.

BTW, I’ve being scaling Meteor apps for many years, before I even join Tiny, so I’m writing below as a Meteor user that saw these solutions working in real apps.

1 - Meteor in runtime is just a group of packages running on Node.js, Meteor is not a runtime so Meteor has no limits to scale or at least it does not have any limit different than Node.js

2 - Subscriptions using DDP messages is a feature of Meteor to delivery real-time experiencie using subscriptions with almost no code in a very productive way but you can delivery data in many different ways (Methods, REST, GraphQL, etc), this is your choice in the end. When you choose to use Subscriptions you are going to receive a lot for free and in a very optimized way, think about the network layer, you only send diffs to the client and you didn’t write any code for this to work but of course some process need to calculate this diffs for you, keep the last state in the server, etc.

3 - Meteor subscriptions with MongoDB oplog will not scale horizontally (adding more containers, btw, Galaxy can do this automatically for you using triggers) at the same ratio of connections by container if you increase a lot the quantity of writes or containers, as every container needs to read all the oplog from MongoDB what will cause a side effect on MongoDB performance, because of that Redis oplog exists.

Important: most of the apps will never need to migrate to redis-oplog, because MongoDB oplog will be enough for most apps :wink: The advice here is: only start to use redis-oplog when you need, the replacement process is a breeze (thanks @diaconutheodor). Also, if you only use Subscriptions for a part of your app this is probably never going to be an issue for you. Some apps are doing everything using subscriptions then the chances are higher to need a solution like redis-oplog.

4 - Redis oplog can bring horizontal scaling for Meteor apps the same ratio of connections per container using how many containers you want as with because it is just going to send the messages to the containers “watching” a specific query. You can fine tune the messages that will be delivered for each container. Important: most of the apps, even if they need redis-oplog they are not going to need fine tuning but if you are using redis-oplog and want to scale even further you should use namespaces.

5 - MongoDB can be the bottleneck in many cases and not Meteor as you will be reading and writing from MongoDB and then if MongoDB starts to run slower this will affect your app.

“Meteor does not scale” is a myth, it is the same as say “Java does not scale” or “Node.js does not scale”, any technology will depend on our implementation more than anything else and Meteor already provide by default a very good solution that will work for almost every app. After that we have amazing packages like redis-oplog, pub-sub-lite to help. Pub-sub-lite is a new package but I’ve already worked in apps that wrote similar solutions, it’s great to see this solution available as a package.

We (as a community) and also I (as Meteor evangelist) need to do a better job promoting Meteor and showing how well it can scale and keeping great features like subscriptions. All other subscriptions technologies if they would do everything that Meteor does I bet that they will performance much less than Meteor.

Disclaimer: nowadays you can use Meteor in many different ways, many different data layers, many different view layers and everything we are discussing here is how to scale Subscriptions with Meteor. If you don’t need or don’t want to have real-time data sync between Mongo and Mini Mongo that is not even something that you need to worry about.

31 Likes

Thank you very much @filipenevola for the very detailed explanation. It is probably the best post on the topic which clearly reassures anyone in doubt. :slight_smile:

3 Likes

Great reply @filipenevola, it’s really great to see the official support! I really like Galaxy and think everyone should use it and share more of their performance findings.

You gave a good piece of info on how the oplog is working, so based on what you said, for many users, they should probably scale in the container size before they scale in the number of containers.

I personally had wondered about scenarios where it might be wise to do this, for anyone who doesn’t know Galaxy yet, it just looks like this:

You can just pick the size of container and see how it impacts your apps performance.

Also, a couple of us are doing some mapping of Meteor’s architecture and I was doing some reading on Live Queries. There is some great info in the guide that shows how to optimize Live Queries.

How to Optimize Live Queries:
https://galaxy-guide.meteor.com/apm-optimizing-your-app-for-live-queries.html

Live Query support is one of the major competencies in Meteor. Normally, a new Live Query is created when you return a cursor from a publication. Then it’ll reactively watch the query and send changes to the client.

In order to detect these changes, Live Queries do some amazing work behind the scenes. To do this, they need to spend some CPU cycles. Therefore, Live Queries are a major factor affecting your app’s CPU usage.

However, the count of Live Queries itself does not cause many issues. These are the factors affecting the CPU usage:

  • Number of documents fetched by Live Queries
  • Number of live changes happening
  • Number of oplog notifications Meteor is receiving

I kind of recommend that the whole community that is interested in scaling should read the link above :+1:Sometimes there is more documented about Meteor than we all realize :grinning:

4 Likes

Please read this: https://thecodebarbarian.com/slow-trains-in-mongodb-and-nodejs
Then read this: https://medium.com/@kyle_martin/mongodb-in-production-how-connection-pool-size-can-bottleneck-application-scale-439c6e5a8424

Perhaps then try to understand the limits of your MongoDB provider (ex: https://docs.atlas.mongodb.com/reference/atlas-limits/)

If your problem is due to a bottleneck at Node - MongoDB relation, consider to distribute your data over multiple DBs and avoid slow queries jamming your fast queries.

5 Likes