Connection costs? practical scaling for huge event


#1

Hi there,

I am building an mobile app for a huge event. There might be 10.000 active users at the same time, and it must function perfectly, because it’s a one time event and only one evening! :scream:

After having read through many scaling discussions of the 2015s and now, I am beginning to think that maintaining a stateful connection at such scale is risky.

While the amount of CPU and RAM usage might be predictable, and scalable, one of the things that bothers me most is a possible limitation on the amount of open websockets and the costs of a connection.

We are planning to host at Heroku and I have read that there is a limit of around 1500 connections per machine (“dyno”). Now my questions:

  1. How expensive is a DDP connection without a subscription?? We might make use of methods to save subscriptions, but I just couldn’t figure out what the price of a bare connection is. :thinking:

  2. Have you ever experienced connection limits and how do you deal with it?

  3. If you read the whole question - what do you generally think about this? Should I go stateless and waive the comfort of DDP?

Thanks a lot for sharing your experience:slight_smile:


#2

I think the best (and only) way to be confident for the big event is to do multiple dress rehearsals (iterative load and stress testing)

Assuming you’re very comfortable with Meteor (as in development speed), I’d suggest implementing the out-of-the-box pubsub in the first iteration and benchmarking it. Based on the numbers, decide if it’s feasible to buy the infra to meet the requirements. I assume you’re well-versed with the best-practices of tuning the queries and publication data in meteor apps (you should find many good advices already posted in the forum).

If it’s not good enough, then you could add redis-oplog that many people in this forum have reported to give a good boost to scalability compared to the out-of-the-box pubsub. Benchmark again, measure the improvements, and check the feasibility based on the new numbers .

If that’s still not sufficient, then try converting your publications to restful endpoints by using something like simple:rest. I am not sure if the server-side is stateless, but at least you won’t need websockets. Benchmark again…

If even that doesn’t turn out to be feasible, you might have to think of some advanced caching in front of the restful services, like this example of using Varnish.

Good luck :thumbsup:, and please do share how it goes.


#3

Hi @gaurav7, thanks a lot for your suggestions and considerations! I was thinking about a similar iterative approach, but given the short development time and the high unpredictability in participation we now consider moving to a REST api right away and using pub/sub only for admin functions and live updates on big screens.

Makes me realize how much work ddp syncs takes away from the developer, but it’s just more difficult to scale horizontally. As we can easily pay a lot for server because it’s a short time, this seems the more reasonable approach.

I still wonder how much of a limit the websockets are. Going to look into Varnish, thanks!


#4

simple:rest is no longer maintained, check out https://github.com/aadamsx/fine-rest instead.


#5

How does fine-rest differ from simple:rest?


#6

@tomsp just upfront I refactored simple:rest, factored out part of it, converted it to ES6, it’s now a npm package, and added more examples you can get to from the readme.

In terms of features, I didn’t carry over the main package of simple rest, which was basically converting your Pubs and Methods to endpoints. This package is focused on the JsonRoutes, the JsonRoutes middleware, and the accounts package integration, which was the underlying tech behind simple rest.

Also, simple:rest is no longer maintained and is not accepting anymore PRs (according to the author). With fine-rest, PRs are welcomed. Also, I’m in beta testing with fine-rest now, and will be using it in a HA production enviornment. JasonRoutes is also used by others in production applications (not the fine-rest lib).


#7

Nice, I like the idea of not turning your methods and pubs into endpoints automagically but having a convenient way of just adding your own endpoints. Back in the days I used picker for it and afterwards simple:json-routes.

I think it is funny that adding traditional REST endpoints is such a popular topic these days. I see discussions left and right. Anyways, we’re highjacking a thread here that was actually about scaling.

Keep it up!


#8

It feels a bit stupid or pointless to use meteor to build a rest api. But given my situation, it seems the smart thing to do.

I still wonder if anybody has experiences with the scalability of websockets or knows if there are any side effects of an active connection besides the connection itself. ( I still secretly wish I could pub/sub and safe a lot of work :wink: )

Thanks for the info about fine-rest. Didn’t realize that simple:rest isn’t maintained anymore.


#9

why? Meteor is so much more than pub/sub. I get the most value out of the zero-config build system, integrated user accounts system, one-command deploys w/galaxy including APM, and integrated database APIs


#10

I am also having the same issue, can anyone suggest me the answer.


#11

There are a some experiments out there that might be of interest to you, check out this one for example. So obviously, compared to a non-persistent-connection-solution, there is some overhead. The only way I can think off to calculate/estimate that in terms of CPU/RAM is stress tests.


#12

In my expierence, subscriptions are quite heavy, while DDP is fine, atleast we didn’t hit the wall.
They are adding just too much magical flavour to use them without 2nd thought.

When I met the same requirements(presistent connection with high loads ahead), had to switch to apollo, but apollo wasn’t really a thing for me, so for next scaled project, I decided to use Grapher. Grapher also provides neat mongo and request optimisations and flexibility aswell as REST fallback for publications and methods built with it. Getting some salt out from mentioned redis-oplog, that was all I could get on server.

Though, I was quite limited with server specs limit, so had to take advantage of Redux and manual application of stalling. So that clients limit their requests/frequency whenever ‘highload flag’ is raised. It caused some problems with returning visitors(for them expierence wasn’t as smooth), but new ones didn’t have any mental dissonance and quite happy overall, though it took some time to build it that way.

Oh, yeah, and multiple endpoints is good way too.
Though, I didn’t have resources to go for it :stuck_out_tongue:


#13

If you’re comfortable with building restful endpoints from the scratch, surely that saves you a few iterations.

You might wanna consider using the latest version of Express, instead of the default Connect lib which is stuck a major version behind.


#14

I do see a value in being able to expose the same “service” (publication or method) via different transport mechanisms. That makes it convenient to compare the performance of both approaches to do some capacity planning in situations like the OP’s, especially if your first go to implementation is DDP. Another use would be to integrate a non-meteor app, without having to use a DDP client; something I’m doing in my own project


#15

How can I use it in meteor? I have built node apps with Express before, but when it comes to meteor I just enjoyed everything being integrated.


#16

@gothicmage, thanks for sharing your experience. I think I’ll go down a similar route – except that I can pay for expensive servers for the few hours there will really be the high traffic.

Definitely plannning a “high load” flag, and thinking about redux (I am not sure yet whether it’s going to be worth it. On the long run it probably will, also for the ability to use redux-persist to save traffic once more).

What to you mean with manual application of stalling ?

I didn’t know Grapher yet. It seems to be a neat mix of graphql-style queries on the existing infrastructure. Got to look into and think about it.


#17

To scale to 10k concurrent users you’re going to have to run a lot of servers.

Some suggestions:

  1. Don’t use publications. Use GraphQL or REST as mentioned above. If you need real time functionality you could use publications or graphql subscriptions or something else. Meteor publications is where the pain point for scaling is going to be for you and most apps don’t actually need to be real time.
  2. Disconnect idle users
  3. Run stress tests before hand.
  4. I’m not a fan of Galaxy, but since it’s a short event, get set up on Galaxy and ask them to raise your limit from 20 containers to 40+. Running 40 * 1GB containers you may still run into trouble with a Meteor app and 10k active users. Depends what your app does though. (And again this is only if you need publications. With no need for publications/observers you should be able to scale fairly easily).
  5. Try redis-oplog.

#18

What does going down a similar route mean to you?

What does this mean exactly?

YEAH, what does manual application stalling entail?

Grapher, I think it just speeds up the MongoDB bottle neck, you’ll still face a bottle neck at the Publish.

Great advice. Also, REST should be an option for you @retani, and NOT a stupid option, but a smart one. Also, I’ve seen code where Publications are NOT reactive – may be something to consider.

Interesting, how do YOU run stress test; I haven’t seen enough info on this around here.

Also, I’m truly curious how people are scaling outside of DDP or REST using Meteor, is there a hybrid DDP/Messaging Queue (RabbitMQ) or something similar that scales?


#19

Like any node package! Check the SSR example in Apollo’s documentation on integration with Meteor.


#20

I only means that the progression of thoughts that @gothicmage describes - being faced with similar requirements - seem familiar to me. For example, I also came to examine Apollo but then decided that it’s not a thing for me. I am also considering redux now for better control over the data flow. I am also planning a “high load flag”

For me, it means integrating mechanisms for spontaneous reduction of API calls at the price of a less good user experience, but for saving the operability of the service as a whole. For example, we might use polling at one point. When I detect an overload of the service, I would set a parameter, the so called “hight load flag” on the server, which would then lead to the clients lowering their polling rate.

What is the “Pub link”?

Good point!