Hi Everyone - our startup (https://www.insidesherpa.com) is having issues with scale on Meteor at the moment. We’re on meteor galaxy and we’re getting > 7,000 concurrent users, but our app is becoming unusable and we’re on 20 containers.
The Galaxy team isn’t being super responsive or helpful with us (update: read why here) - so we’d love to get some help to fix this problem. Has anyone faced this issue before?
Thanks @alawi and @captainn ! We’ve started using https://github.com/adtribute/pub-sub-lite pub sub lite (as a way of reducing the usage of proper pub sub) and seeing some small wins there now. We’re already using redis oplog, so it looks like the next step for us is to migrate as many pub-subs to pub-sub-lite.
Still looking for other ways to manage performance.
There are more drastic things you can do too, like switching over to methods for data, then using something like simple:rest to convert your DDP requests to REST requests. Once you do that, you can use cloudfront or similar to cache those requests, and reduce the pressure on your node.js server.
Edit: maestroqadev:pub-sub-lite is a great find! I’m going to start using that immediately for one of my hobby projects!
@pasharayan it’s hard to give any specific advice without knowing how your app is designed & the choices you’ve made for the processing work your app needs to do. I can give general advice for things that would help everyone’s app. You’ve probably done many of these already, but I’m just listing them for everyone’s benefit and the chance that you might find some helpful.
Here is a list of some things that put load on your server cluster:
JS bundle sending on new client connections & client refreshes (improve by adding a ServiceWorker.js file to cache your JS assets, 1 hr of effort)
Heavy data over pub/sub (DDP) (make sure you are examining the data moving over pub/sub with Meteor APM, try to root cause the biggest resource drains to optimize your app, a typical Meteor app without issues can handle many more concurrent connections)
Multi-user update loops, data change by 1 user causes 7000+ other users to get the change (alter your internal app design if you have this type of thing going on, just be aware of how data & updates propagate through your app, map this out in a visual tool to really make sure you know what is going on)
Using your Server to do CDN type workloads, like hosting images & video files (make sure these bandwidth heavy tasks like image hosting, video hosting, etc. are moved to a CDN of choice)
Over-using the server when you could put some user specific processing on the client alone (the client’s browser can handle a lot of processing that you might consider doing on the server, make sure you balance the workload)
MongoDB queries where too much processing is done on the Server (remember MongoDB has many advanced query types where processing loads can be handled by your MongoDB cluster, make good design choices & optimize where you can)
Not using enough Async/Await code on the Server (make sure you don’t have functions waiting for other functions that are holding up your server processing, optimize this in your app)
Wasteful processing from too many timers driven by client actions (don’t use timing delay functions in your code if you have many clients, this is usually also fixed with proper use of async/await in your app)
These are just some things that come to mind. Little things that would not be noticable with small numbers of users add up to cause issues. Make sure you find the root causes of each issue and just work on them 1 at a time to get your performance gains up.
Meteor with DDP pub/sub is very scalable if it is used thoughtfully.
In my app, I only use DPP pub/sub where I need the features and I use async/await Meteor Methods, which use DDP, but are being run only when specific data is needed. I think this is an ideal approach.
I’m happy to elaborate if you have any specific questions that come to mind. Could you tell us a bit more about what you are using in your Meteor Stack? Blaze, React, Vue, etc.? What your app does when concurrent users are connected? etc.
" JS bundle sending on new client connections & client refreshes (improve by adding a ServiceWorker.js file to cache your JS assets, 1 hr of effort)" - or just deliver from a CDN:
In your Meteor startup/server
if (Meteor.isProduction) {
WebAppInternals.setBundledJsCssUrlRewriteHook(url => {
return `https://your_cloudfront_cdn.com${url}&app_v_=${process.env.npm_package_version}`
})
}
If you go this way, just mention my name so I can give you the Cloudfront configuration.
Hi @pasharayan, I’m the author of pub-sub-lite. It’s a new package so I’m very happy to see early adopters! Thanks for trying out the package and feel free to let me know if you encounter any issue.
Regarding your performance problem:
I would like to echo what have been mentioned by others here about reducing the use of pub/sub in favor of Methods. It seems that you’ve already started going on this path with pub-sub-lite, which is great.
Are you currently sending a large amount of data to each client? If so, is it possible to reduce the size of that data (e.g. filtering only the necessary document fields, doing pagination to reduce data on initial load)?
You mentioned that you app has been “becoming unusable”. Does the app feel slow and laggy on the client-side? Although most of Meteor performance problems occur on the server, you may face them on the client as well. For example, if the client has to process too much data, the UI may become unresponsive. One more thing to look out for is that if you store a large amount of data in Minimongo, the performance may suffer because Minimongo doesn’t maintain any index. It can be more performant to just store your documents in a normal array and use native JavaScript Array methods (find, filter,…) to access them (obviously you’ll lose the benefit of reactive rendering, so this should only be seen as a workaround for edge cases when the amount of data is too large).
Did you notice anything unusual in your Meteor APM? It would be helpful if you can share your APM screenshots with us.
Hi, on Galaxy we don’t provide code level support but besides that we try to help as much as possible, even providing insights at the code level and this has been the case also in your recent tickets.
About your issues, are you comparing your connection metric with Google Analytics or other tool? Maybe you are keeping many live queries for idle clients, a package like mixmax:smart-disconnect could help.
There are also cases when oplog cannot be used in pub/sub, so meteor uses PollingDriver which is very slow, MontiAPM which is based on Kadira will show that pub/subs
Good luck with optimization, I believe Meteor can handle even more connections than 7000+.
I’ve had performance problems on Galaxy which Galaxy Support never adequately addressed. Switched to NodeChef and problems were solved. BTW, lots of good performance optimization suggestions in this thread (many of which I had tried to no avail). NodeChef isn’t problem-free though either as I’ve experienced outages on my NodeChef hosted apps. However, when they run, they run well.
Our load tests showed about ~300 concurrent connections per server, before load times skyrocket. Usually minimum size containers are used, so that it is no more than 50% RAM used in ‘idle’ state (in our case it is 512MB RAM containers, but normally 256MB containers are enough for simple application). This is without usage of Redis Oplog (which we want to try soon). This however strangely matches to your 7000 connections per 20 servers (7000 / 20 = 350 connections per server). Now I am interested if this is the maximum physical cap here? Or if larger servers may help?
Hi @cormip how are you doing? I believe you are talking about past events (before Tiny acquisition), right? I would be happy to review the issues that you had with Galaxy and offer the trial for you to check Galaxy again.
We have thousands of Meteor apps running on Galaxy, handling thousands of connections without issues.
Please reach me out on filipe@meteor.com or support@meteor.com so I can understand your issues. If they are still happening that is even better so we can improve Galaxy even further
I have talked with @pasharayan by voice and he was using a different channel to communicate with Galaxy team and then even simple requests, like increase container limits, were not being received, he was not able to remember what was the channel specifically but I assume it was not the current valid ones. We did a test together sending requests using Meteor website and it’s working as expect, from now on I don’t believe he is going to have these issues anymore, it was a problem in the channel used to reach us and not with our support. And to be clear, the best channel is to send an email to support@meteor.com
But, in the same time, our support was replying many messages from Julius (Insidesherpa CTO) but I understand that when things were burning at their company and then maybe Pasha was not aware of that.
I just want to reinforce here that Galaxy is a very important piece of the Meteor ecosystem and we (Tiny) are doing our best to provide the best experience possible. We have received other the last 9 months a lot of great feedback about our support and service.
I know we have things to improve, we always have, but I’m sure we are providing a very good service here. And, if you are a Galaxy as well and you are not happy, please, send me an email filipe@meteor.com