Is Meteor still unable to process at least 1000 concurrent users?

noperapon · October 28, 2020, 2:32pm

Oh VPS. It is absolutely not advisable to host a Meteor app,

Yes. It has some sense. But unfortunately the apps with about the same level of complexity written in Go and on this same VPS (beside the meteor app) has no this weakness. Besides when I’ve changed the meteor’s pub/sub subsystem with my self-made subsystem in javascript code on client and om server but moved the socket server to a Go implemented server (centrifugo), exactly this meteor app become behave itself very good. And 100% of CPU usage fall down to 3-5% and maximum to 20% with 400-500 users online. Earlier 50-60 users crashed the server.

noperapon · October 28, 2020, 2:36pm

In fact, this is a really good point, and it leads us to the next question:
…
Do we have such a test application?

Unfortunately I had to test this with the real users. I’ve found no tool for such a testing. And it painful

peterfkruger · October 28, 2020, 2:38pm

Yes, but as I said, it may just as well be that the Node.js websocket implementation was poorly written back then and that it was fixed since. We are 5 years and 10 websocket package releases later now.

You have a perfectly nice new stack with which you seem to be very happy, and I think we all share the same sentiment. There are no current bug reports regarding Node.js websocket performance, so on that end I don’t see a reason for concerns either.

So you’re happy, we’re happy, what’s the problem?

noperapon · October 28, 2020, 2:39pm

It’s not surprising to see a javascript implementation in a javascript environment; it is usually enough to limit the native code to the part that needs to be super efficient. 1% or less wouldn’t surprise me at all.

There was an attempt to bind the superfast µWebSockets (written in c++) to node as module but I don’t know its status. https://github.com/uNetworking/uWebSockets

I think It could solve this issues

peterfkruger · October 28, 2020, 2:42pm

If there is currently really no tool for such tests, I agree with you that we need one.

alawi · October 28, 2020, 2:43pm

Sorry, I wanted to check the numbers again before posting, this was a few years back when we did the load test.

I did 1.4k (not 1.8k as I mentioned above) in a small DO droplet (correction no the old one, the 1GB RAM instance) and this was Meteor 1.5 if I’m not mistaken, note that there are no pub/subs here, only accounts and roles. I think this specific test had multiple hosts (the screenshot below), but the RAM consumption was around 0.6 MB/user session so you should hit 1k session with a 1GB of RAM VM (accounts pub/sub only).

If you start adding specialized pub/sub it gets expensive quickly and you’ll have to scale horizontally.

So yeah it is possible to hit 1k sessions on 1 GB RAM VM if you don’t have user specific pub/subs. Otherwise, it depends on your pub/subs and how much RAM they would consume on the server.

I hope that helps!

noperapon · October 28, 2020, 2:45pm

So you’re happy, we’re happy, what’s the problem?

I’m not happy because I have working apps that I would like to evolve but not rewrite. It’s too heavy to rewrite. An so I’m watching meteor’s progress.

There are no current bug reports regarding Node.js websocket performance, so on that end I don’t see a reason for concerns either.

Maybe because nobody just use it in high load apps? Although while we were talking, @alawi, I guess, put here info that they use it in highload, but for some reason deleted it a bit later.

alawi · October 28, 2020, 2:50pm

Yes, sorry I wanted to double-check those numbers since I did those load tests in 2018. I actually forgot about the numbers since we been running production with no issue to speak of, but I didn’t want to mislead anyone.

For the test, I’ve used many VMs running puppeteer that hits the VM with a scenario where the user login and stay idle. In this test scenario, the users were added gradually, so no burst traffic and that keeps the CPU low, so basically, this is testing memory.

I hope that helps.

peterfkruger · October 28, 2020, 2:52pm

I think it would be very useful to have reliable measurements with real numbers / charts that document the performance curve in various subscription scenarios, all performed in a controlled environment. In fact, this is so important that it should ideally become part of the Meteor documentation.

All we know now is that when subscriptions come to play, Meteor is becoming resource-hungry, but that’s just a rule of thumb.

The actual increase of consumed resources (CPU, memory) will still largely depend on a number of other variables, such as whether or not redis-oplog is used, the number and size of subscribed documents etc. We would need to test specific scenarios to show how Meteor performs in those.

noperapon · October 28, 2020, 2:52pm

Thank you. It looks nice. After all you confirm my opinion that pub/sub in meteor is bottleneck. I can see very few observers and CPU usage is good. But meteor was a star among others frameworks because of pub/sub subsystem…

alawi · October 28, 2020, 2:56pm

Well yeah, you’ve to be careful with Pub/Sub on heavy load, because they’re heavy on RAM usage due to the merge box on the server, there are ways to disable the mergebox but that is not the default behavior. If you’ve many pub/subs you will need to scale horizontally quickly. Also, it differ if you’ve unique pub/sub per session or general re-usable observers.

That is why I asked if you’re using something like pub/sub with Go, otherwise it is not a fair comparison.

But to answer your question, you can hit 1k with 1 vm if you don’t have many user specific pub/subs.

noperapon · October 28, 2020, 3:03pm

That is why I asked if you’re using something like pub/sub with Go, otherwise it is not a fair comparison.

yes, that’s right. After all it happens that reactivity has a few scenarios of usage. In most cases there can be use a simplier implementations like SSE, i would say - just a part of real reactivity. But the real reactivity might be back if using the proper serverside implementations. Take my example - I moved socket server to a separate process and it was written in Go, then I’ve rewritten the core functions (very basic, but enough) of pub/sub principles of meteor in app and it went well. So currently my meteor app has reactivity (limited) and a low server resources.

znewsham · October 28, 2020, 3:06pm

Are these documented? I’ve patched the packages a couple of times to change the behaviour of mergebox, but it would be awesome if there was now a configuration option to do this?

alawi · October 28, 2020, 3:07pm

Actually, I think that was the original vision for Meteor’s pub/sub to have it run on a separate process eventually, but the current approach works well for most cases I guess. It will be nice if we have several implementation strategies of pub/sub, for example, redis-oplog reduces the CPU but it seems to cause load on the DB read/write, but perhaps we can also have a cloud based scaling strategy, like the service you written in Go. Again, the current approach is good for many cases, but as I mentioned, you’ll have to scale horizontally quickly if you’ve multiple specific observers.

There is a PR on disabling mergebox here

github.com/meteor/meteor

Initial support for publication strategies in livedata server

meteor:devel ← meteor:feat/livedata-merge-strategies

opened 07:04PM - 27 Aug 20 UTC

sebakerckhof

+123 -55

Rationale: Meteor optimizes bandwidth over memory usage in the livedata server …by only sending deltas on collection changes over multiple publications. While I believe it is a sane default behavior, it has been a long outstanding request ( https://github.com/meteor/meteor-feature-requests/issues/79 ) to be able to customize this behavior. This would have 2 clear benefits / use-case 1. Reduce memory footprint of the server, especially if you publish lots of data but without overlapping publications 2. It would allow for new kinds of collections/publications. E.g. one can imagine a volatile message queue (send & forget) backed by rabbitmq or redis. In this case a session collection view has no purpose. Up till now you needed to make your own protocol over the websocket with a package like `meteor-custom-protocol`, `streamy` and the likes. By doing so you lose a lot of useful features of the livedata server like rate limitation, message batching etc. The contents of this PR lays the foundation for the above, but is not yet sufficient for use-case 1 (client-side mergebox). I already submit it as-is since it already allows for use case 2 (e.g. volatile message queues). If this gets accepted I could continue working towards use case 1, which, in order to maintain the same semantics, would imply sending subscription Id's on added/removed/changed messages and having the SessionCollectionView/SessionDocumentView classes in the client bundle. I would also need to think more about a strategy for when the userId changes, but it's doable. Note that it would be possible to set a strategy on a per-collection basis.

I think this PR should be merged, the engineers of Meteor 10 years ago optimized for data over the wire against RAM because the internet wasn’t as good as it is today. With today’s connection and scale, I’d rather see emphasize on the RAM over the network consumption.

Hopefully we get this merged soon

afrokick · October 28, 2020, 8:59pm

Per hosts

Summary

Sorry for mobile screenshot.

@noperapon as you can see, we handle ~100ccu/host with ~17% CPU. We have some pub/sub.

Also you can see some spikes on cpu and memory. It is because we have administrators with some heavy workloads.

My answer - of course, you can reach 100+ ccu on production app even with vps.

Edited: we don’t use redid oplog)

deligence1 · October 29, 2020, 6:20am

https://www.playfactile.com - We are developing and enhancing this app from last 3 years. It’s in Meteor + Blaze. At peak time, the app usually have 3000+ concurrent users.

Regards,
Sanjay Kumar

noperapon · October 29, 2020, 6:24am

Very nice. What meteor version do you use and how many subscriptions in app? Is this default meteor usage or you used some tricks?

alawi · October 29, 2020, 6:53am

Some tricks are skills needed to optimize when you are working at scale, each app is different. Unless you completely outsource your server (a.k.a serverless) you will need to learn some tricks on any platform, and I am sure you aware of that. And yes Meteor real time oplog tailing implementation of pub/sub is more expensive then stateless express servers or Go greenthreads on a multicore. With that said, I think there are many ways to optimize pub/sub and improve the architecture, some ongoing PRs on this topic and packages such as redis oplog.

And the reason why you got some push back on this thread, because this topic has been discussed so many times, and most of us here (at least myself) are tired of saying the same thing and debunking the same myths and want to move forward. If you just search scaling and performance you will find the same topic discussed repeatedly.

I hope you don’t take this as an offensive statement (that is not my intention, in fact I will be very happy if you join us back in Meteor you seem experienced with healthy dose of skepticism), just saying the truth as I see it. Meteor delivers and saves you cost to validate you ideas etc, and that’s why most folks here use it, but like any platform, you need to learn more once you reach some scale.

peterfkruger · October 29, 2020, 8:00am

Re: Meteor vs. Go greenthreads: why don’t we, as a community, make a joint effort to make Meteor wholly, or at least partially, multi-threaded?

Workers are at this date officially experimental in Node.js, yet we can assume with a high probability that they won’t be discontinued.

It would be just great if, at a bare minimum, Meteor methods would run either automatically or on demand, in workers. It’s a no-brainer that Meteor’s performance would be increased. By the same token, we should have a go at pub/sub.

Why don’t we create an ad-hoc user group and have a go at this? Our expertise as a community is enormous.

DanielDornhardt · October 29, 2020, 8:49am

I just wanted to add another, maybe obvious to some, observation:

One trap one can fall into is if you have multiple subscribers observing & updating the same data in the database, and maybe updating the state every few seconds (think a multiplayer game for example).

Now if every client is subscribed to the data of all other clients & updates his data every second for example, things escalate quickly (ask me how I know! :D)

Because the number of messages per update interval (let’s say 1 second, just for the lulz) is numClients * numClients, which escalates quickly.

10 players -> 100 messages / second; 20 players -> 400; 50 -> 2500; 75 -> 5625 …

Between 50 - 70 was around the time when our app broke / couldn’t keep up in production with the “naive” implementation.

So this is an example of every app / case being different & having to be considered on it’s own.

After centralizing & accumulating the data & publishing it only once every few seconds the number of calls is now linear again, (Number of Update Calls / second = numClients + number of updates sent to clients / second = numClients).

So maybe have a look on whether and how the information for each user is maybe interacting with each other. Make sure to only .find() and publish required {fields: [’’]} so the publication isn’t triggered on unnecessary updates & make sure there isn’t a lot of “crosstalk” in regular intervals between clients (use the reactivity only if something actually happened for example).

Also 1000+ users at the same time is already a fair number of users, depending on the use case. Normally the daily users aren’t there all at the same time so even with eg. 10000+ Visitors, you might only have a hundred on the page at the same time, althought hat might differ from case to case too of course.

On the positive side, you might already be very successful with your project + could invest in a little horizontal scaling too!

Best wishes & have fun everyone

(PS: I’d also welcome any performance / memory improvements of course, and also think a kind of testing tool would be great!)