🚀 Meteor Scaling/Performance Best Practices

alawi · May 21, 2020, 9:05am

Hi,

As most of you know, Meteor scaling has been a constant source of FUD inside and outside the community. And many of you here managed to build and scale Meteor applications in production.

In the same spirit as the " What do you expect from your Meteor PWA", I think it is worth documenting the best practices and insights this community has around scaling.

Thus, if you’ve managed to scale a Meteor app in production, I would really appreciate if you could take a minute to answer these:

Any deep insights you think it’d worth sharing with the community?
What are the common pitfalls one should avoid?
Any advice, tools, packages, best practices you would like to share with the community?
Any articles, references or gerenal performance/scaling tips you would like to share or learn about?

Your answers would help me to start writing a manual/tutorial on this subject. And please keep the discussion constructive and positive.

alawi · May 21, 2020, 9:06am

I’ll start myself with tips I’ve learnt along the way:

If you don’t need real-time then just use Meteor methods or Apollo. This will make Meteor as scalable as any other socket based node app
Do use out of the box pub/sub to validate ideas quickly but don’t expect to scale without any tweaks
If you’ve a lot of per user pub/sub observers and you don’t want to worry about the technicalities of scaling, then I’d recommend Galaxy
Use as many re-usable publications as you can, the cost of scaling those are minimal compared to user specific publications (from my test, on average user specific publications are around 6 times more expensive in memory then generic re-usable publications)
If you’ve many user specific publications then I’d either switch to redis or really bump up the memory per VM
If you expect a lot of traffic spike then you need to overprovision the VMs or cluster meteor instances on the CPUs. Remember node is single threaded so by default it doesn’t leverage the multiple CPUs, so you either fork multiple meteor processes or scale the app horizontally with many small nodes and put load balancer in front of it, this is true for all node applications
If you really want to have full control over you reactivity try the awesome redis-oplog package by cult-of-coders.

radekmie · May 21, 2020, 10:12am

Use as many re-usable publications as you can […]

Go one step further - split mixed ones into shared and non-shared. Let’s say your publication looks like this:

Meteor.publish('notificationsForUser', userId => {
  // Authorization...
  return Notifications.find({
    $or: [{ type: 'global' }, { type: 'user', userId }]
  });
});

Then check if the number of shared documents (here: type: 'global') is large. If so, such a split:

Meteor.publish('notificationsGlobal', () => {
  // Authorization...
  return Notifications.find({ type: 'global' });
});

Meteor.publish('notificationsForUser', userId => {
  // Authorization...
  return Notifications.find({ type: 'user', userId });
});

May drastically increase the number of reused observers and, in the end, greatly reduce both DB and server pressure. Once I’ve split such a publication into 4 (mixed ~> global, organisation, month, user) and reduced costs by 30% (we switched to a smaller DB instance and reduced the number of containers).

Just remember - always measure it for yourself!

bearad · May 22, 2020, 6:18pm

radekmie:

Use as many re-usable publications as you can […]

Go one step further - split mixed ones into shared and non-shared. Let’s say your publication looks like this:
Meteor.publish('notificationsForUser', userId => {
  // Authorization...
  return Notifications.find({
    $or: [{ type: 'global' }, { type: 'user', userId }]
  });
});
Then check if the number of shared documents (here: type: 'global') is large. If so, such a split:
Meteor.publish('notificationsGlobal', () => {
  // Authorization...
  return Notifications.find({ type: 'global' });
});

Meteor.publish('notificationsForUser', userId => {
  // Authorization...
  return Notifications.find({ type: 'user', userId });
});
May drastically increase the number of reused observers and, in the end, greatly reduce both DB and server pressure. Once I’ve split such a publication into 4 (mixed ~> global, organisation, month, user) and reduced costs by 30% (we switched to a smaller DB instance and reduced the number of containers).
https://uk.edubirdie.com/assignment-expert
Just remember - always measure it for yourself!

Thank you for such a valuable information. I’ve been looking for a solution to this problem for a long time and I finally found one. Really appreciate your replies!

veered · May 22, 2020, 6:39pm

It often makes a big difference to increase the memory limit on the Meteor server. You can do this by setting the NODE_OPTIONS environment variable to --max-old-space-size=whatever in production for the server process.

a4xrbj1 · May 24, 2020, 5:35am

Well, the best advice that I can give is to have separate apps for Frontend and Backend. That allows us to scale independently and thus reduce cost.

Right now we’re able to run 6 concurrent users on the Backend (which has very CPU intense tasks and lot’s of MongoDb ops) with 75% CPU and 70% memory utilization on the smallest AWS Fargate box.

At the same time, our Frontend was at less than 1% CPU and 30% Memory. There’s still room to optimize Frontend more by having a really critical look where we can use remote calls/methods instead of reactivity.

Also ALWAYS use fields in your query. I know it’s a PITA to then have errors when you miss just one but the benefit is indeed to minimize the amount of data transferred.

BTW, at the above load peak (so far) we had 10% Atlas CPU on the M10 resource. IOPS were at 150, so still some room to the 1000 IOPS limit they set.

For more info about our app head over to https://yourdna.family/

PS: In case you wonder why our landing page is so fast, it’s a static Vanilla JS one and only with the Meteor login we switch and load

illustreets · May 27, 2020, 2:11pm

Here is a good article on the topic: https://galaxy-guide.meteor.com/apm-improve-cpu-and-network-usage.html. Even if it’s under the section METEOR APM and you are not using Galaxy, there is a wealth of information in those articles.

minhna · May 27, 2020, 3:34pm

I don’t use pub/sub to load list of documents. I use methods to load them.
If I need the list updated automatically, then I use an other collection to stores changes. Then use pub/sub with one document only. If there is any change, use method to fetch the list again.
Meteor works fine with load balancing. I’m currently runing a set of many Meteor instances and mongodb instances (replicaset) on Google Cloud. I can turn off some servers doing upgrades then turn them on without downtime. The only problem is Redis Oplogs. It doesn’t support redis cluster. I will have problem if that server is down.

simonsimcity · May 27, 2020, 4:53pm

I personally think that pub-sub remains the core critical component about scaling Meteor. If you can avoid it, use other techniques, like Meteor methods instead. It’s very useful but often the hardest to scale.

If you have data which changes uncommonly frequent but is not needed in a pub-sub, it might pay out to put it into a separate MongoDB cluster or even a different database. Meteor reads the oplog collection of your MongDB cluster. The more changes you have in your cluster, the more data your system will have to process. See: https://stackoverflow.com/questions/20535755/using-multiple-mongodb-databases-with-meteor-js

The package https://github.com/cult-of-coders/redis-oplog only communicates the changes happening from within Meteor - but if you need to also follow along in changes other systems are doing to the database without having to send additional messages to redis (as described in https://github.com/cult-of-coders/redis-oplog/blob/master/docs/outside_mutations.md), you can combine redis-oplog with the go application https://github.com/tulip/oplogtoredis. Every Meteor instance by default follows along with the oplog collection of your cluster. Using redis-oplog alongside with oplogtoredis, it’s only the go application reading the oplog collection. This should ease the load on the individual Meteor instances and on the database cluster.

dgan · May 28, 2020, 6:41pm

I recently ran into a situation where my processing intensive instance was doing nothing for 98% of the time and when it needed to do something it took a long time (not good for the users since they are waiting for the result). This is running on galaxy on a small instance to keep costs down.

Did some analysis and CPU was the bottleneck during execution. I ported this to Amazon Lambda. Now this functionality is 10-20 times faster, and for the 2% of the time its actually doing something, I get charged a tiny amount just for that. Rest of the time when its Idle, there is no cost. It also scales infinity automatically.

I was able to massage my current meteor app code to use a subset of the JS and off the shelf mongo driver. I also added the batch methods for the mongo db driver (insert multiple, bulk write multiple). This sped the app by a lot. I also batched the “find” to fetch a set of records instead of one at a time.

I wish the Meteor code would run out of the box on Lambda (without the pub/sub) and I could link Lambda functions to it.

I would love to see out of the box support in meteor for AWS Lambda.

Imagine This
Lambda - hosts anything processing intensive that you want to parallelize and scale infinitely on demand.
Regular Docker Meteor (e.g. Galaxy) - Provides reactive pub/sub and quick api calls.

alawi · May 28, 2020, 8:05pm

Thanks for sharing, this is helpful.

I am curious, can you share a bit about the nature of the task being performed in those 2%?

dgan · May 28, 2020, 9:10pm

App is in Beta (small group of users) and schedules tasks into a person’s calendar. It only needs to do the scheduling if something changes (tasks change or calendar changes) but when it does there is a lot of processing that occurs for a short period of time.

You can check it out here if you are interested https://yomez.com

alawi · May 28, 2020, 9:31pm

Yes got it, I was curious about the CPU intensive use case, I will definitely check the app.

Have you thought about using worker thread for the CPU intensive task?
I did something similar with an image processing function. Images where uploaded to storage directly from client (to prevent eating the server RAM) and the processing (compression and manipulation) were done using Google Cloud Functions to protect the server CPU.

I usually try to offload any CPU or RAM intensive tasks of the server machine (specially NodeJS servers) and restrict the server to serving results.

Thanks for sharing again, I think it’s common, I had my share of those

dgan · May 28, 2020, 9:49pm

No, Did not think of using worker threads. I’m assuming you mentioned that for parallel execution? (CPU was already pegged on Galaxy so would not have helped)

I was using a separate instance from the user facing instance to prevent impact (which I think is what you mentioned) but it was still too slow and scaling would have been a challenge + additional cost.

Lambda was a win for this use case much better on all fronts (much faster speed, lower cost, automatic scaling/parallel execution).

alawi · May 28, 2020, 9:58pm

Yes, you’re right.

I personally think this the ideal case for function as a service, and good thing nowadays we’ve those cloud functions.

vlasky · May 28, 2020, 11:33pm

I have a lot to say. Consider this post to be part 1.

Scaling/performance tips that DO NOT require changes to your code

Use a dedicated server instead of a VPS. I refer to my previous post An Enemy of Scalability - Hypervisor (Virtualization) Overhead.
If you are doing lots of file I/O or if you are making heavy use of the database, make sure your dedicated server has an SSD, preferably an NVMe SSD. Hetzner and OVH offer dedicated servers with NVMe SSDs. There are a few other providers as well.
Use Nginx as a reverse-proxy in front of your Meteor app. Nginx will do what it is most efficient at - terminating your HTTPS connection and serving static assets. You want to avoid the node process having its time needlessly wasted.

Our Meteor deployment script also compresses static assets on disk using the brotli compressor. Nginx’s brotli module has an option brotli_static that enables Nginx to automatically serve the compressed version of assets from disk by looking for files with the .brotli extension instead of having to waste its CPU time compressing the asset on-the-fly.
If you can, make all subsystems running on the same server communicate with each other using UNIX sockets instead of TCP sockets. Communication over UNIX sockets incurs significantly less overhead.

In our deployments, Nginx passes requests onto Meteor via its UNIX socket file (e.g. /var/lib/mysql/mysql.sock) and Meteor passes requests onto MySQL via its UNIX socket (e.g. /var/run/meteor/meteor.sock).

To configure Meteor to listen on a UNIX socket, specify the UNIX_SOCKET_PATH environment variable.

Other common services like MongoDB, PostgreSQL, Redis and Memcached can be configured to listen on UNIX sockets as well.
Cloudflare supports proxying WebSockets so it works well with Meteor apps. It is useful for:

Preventing the IP address of your origin server from being exposed,
Providing some protection against DDOS attacks
Serving static assets from Cloudflare’s edge locations closest to the user and reducing your origin server’s data usage.
If you have a server outage, you can use Cloudflare’s API to quickly divert traffic to a hot standby Meteor app server without any DNS propagation delay
Cloudflare also offer a load balancer if you have a cluster of Meteor app servers and need to distribute traffic between them.

If you are using Cloudflare with Meteor, you should modify your Meteor startup script to set the environment variable HTTP_FORWARDED_COUNT=1

If you are using Cloudflare with Nginx and Meteor, you should modify it to set HTTP_FORWARDED_COUNT=2

Scaling/performance tips that DO require changes to your code

Do everything you can to avoid running CPU intensive code in the Node.js event loop. Instead, such code should run asynchronously in the thread pool.

Up until recently, this often had to be done by writing a Node Addon in C++ using Native Abstractions for Node (NaN) or N-API. This approach is commonly used by number-crunching code that has to run at top speed, e.g. high performance crypto packages like bcrypt and shacrypt.

However, thanks to the introduction of the node.js worker_threads module combined with the SharedArrayBuffer data type, it has become more practicable to write thread pool code in JavaScript and get acceptable performance.

For more info:
- A previous post of mine on the thread “How does Meteor scale out vertically and horizontally?”
- Nodejs.org’s article Don’t Block the Event Loop (or the Worker Pool)
There is also node’s inbuilt cluster module that allows you to spawn multiple node.js (Meteor) worker processes. It has its uses and I have commented on it in the past. Today, I would advise people to first try and solve their problems using the thread pool and only use the cluster module as a last resort.

vlasky · June 1, 2020, 10:27am

A few days ago I discovered this NPM package threads.js that claims to “make web workers & worker threads as simple as a [JavaScript] function call”.

I haven’t had to use it yet, but I will definitely give it a try with Meteor at some point.

dgan · June 5, 2020, 4:00pm

Great tips. I want to also remind all of us (including myself because even after 26+ years of doing this I still forget), Profile your code before deciding what to optimize (i.e. find the root cause for why things are slow).

For me, the root cause for many slow things was the code (fetching or updating one record at a time in the DB). Also CPU was also a limit and no amount of threading would help with that so I went to AWS Lambda to run some of the code for the specific situation I had.

Root Cause

vikr00001 · June 6, 2020, 1:47am

I hadn’t realized Cloudflare could work with Meteor. This is great info!

illustreets · June 6, 2020, 8:21am

That’s been the case for quite a while. If you’re looking for a guide, here’s an older discussion: Simple guide for optimising your Meteor app with Cloudflare (Cache, TTFB, Firewall, etc)