CPU jumping up like crazy and crashing production server

elie · July 21, 2015, 11:01pm

Hi,

Most of the time, my app is running at 10-20% CPU and under and using about 400 MB RAM (out of 1GB).

Tonight RAM started hitting 600MB, but CPU flew up to 70% and then eventually 100% crashing the app. I was handling more traffic than usual but it wasn’t that much more. Why did the CPU increase so quickly out of nowhere instead of increasing gradually as does the RAM?

Responses very much appreciated. Have a serious scaling problem on my hands now.

elie · July 21, 2015, 11:12pm

What seems to have a very big impact is the number of observers increasing a lot. Would restarting my app fix that problem? After the restart, the number of observers dropped from 500-1000 to under 50 or so, although fewer users online.

What could cause exponential observer count? I don’t think I had an exponential number of users on my site. At least not according to analytics and kadira.

copleykj · July 22, 2015, 1:27am

Restarting is a temporary fix. It sounds like something is failing to stop cursor observers. You’ll need to hunt this down and correct it to actually fix the issue.

robfallows · July 22, 2015, 1:35pm

MDG are putting in an improvement to the multiple observer issue - and here for the nitty-gritty. No word as yet on what sort of improvement we might expect to see, but it’s good to see this is being worked on.

sikanx · July 22, 2015, 2:39pm

are you using template subscriptions or Meteor.subscribe?

elie · July 22, 2015, 2:53pm

I use both in my app.

I use subsManager for most subscriptions.

I have a few Meteor.subscribe calls. Both in the router and on the template
level.

sikanx · July 22, 2015, 3:39pm

I had a similar issue with the app being non responsive when switching between routes. Then i removed all Meteor.subscription from the router and replaced them into the template this.subscriptions. The unresponsiveness went away and the app is super fast now.

I highly suggest using flow-router to render the templates and handle subscriptions and data all in the template, keeping everything in one place.

elie · July 22, 2015, 3:40pm

The stats in facts package are showing:

**livedata**
subscriptions
1808
invalidation-crossbar-listeners
2692
sessions
114
**mongo-livedata**
observe-multiplexers
1299
observe-drivers-oplog
1299
oplog-watchers
2692
observe-handles
2934
time-spent-in-QUERYING-phase
1960342
time-spent-in-STEADY-phase
331128612
observe-drivers-polling
0
time-spent-in-FETCHING-phase
351776

Are any of these number far to high? Does 1808 subs for 114 sessions make sense?

elie · July 22, 2015, 3:41pm

My issues are server side, not client side. Or are you also talking about server side problems?

sikanx · July 22, 2015, 3:45pm

I am guessing that you COULD have some subscriptions somewhere that is not being terminated after the user switched routes. And as they go back and forth between pages they are just keep subscribing to more publications and not ending the previous one.

This could happen when you use Meteor.subscribe, because it subscribe the whole application to a publication, not just the template instance.

What i am suggesting is that dont use Meteor.subscribe unless necessary. Keep subscriptions local to the template so when people switch between routes they are 100% un-subed from the previous page publications .

elie · July 22, 2015, 4:08pm

I actually reuse a lot of subs using subManager:
https://kadira.io/academy/reduce-bandwidth-and-cpu-waste

Sanjo · July 22, 2015, 6:15pm

Start measuring with Kadira.

elie · July 22, 2015, 7:26pm

I am. On the paid plan now even.

Realised the problem isn’t with observers at all.

Need to load up more instances of the app, but need to refactor code first since there are a bunch of tasks running on timeouts that shouldn’t be called twice

elie · July 22, 2015, 7:26pm

Looks like this:

muaddib · July 22, 2015, 7:39pm

it’s a ram problem. 1 gb on a production server? Too little. I would not go below 2 machines of 3 gb each. Resources are so damn cheap…

please post htop during problem. I bet it’s a IO wait

elie · July 22, 2015, 7:55pm

Well I upgraded to DO $20 last night. So it now has 2GB RAM, but exactly the same issues. Only using 500MB of the 2GB, so could that really be the problem?

MongoDB is hosted on Compose.io Elastic Deployment. I could send them an email to use more RAM there. No idea if that’s needed though.

And plan on moving on using multiple machines, but need to refactor code first to do it.

muaddib · July 22, 2015, 7:56pm

can you post htop during problem?

that would solve stuff.

With 20$ you can get a badass 8 gb server from OVH.

elie · July 22, 2015, 8:21pm

muaddib · July 22, 2015, 8:25pm

mongodb is on the same server. You stated otherwise, Maybe this is a misconfiguration? Separating mongo and meteor helps a lot.

Is the server unresponsive during this top? It looks like it has high load on single thread. Maybe there is some expensive logic running? Maybe the cpu is too weak? (happens often with VPS). I see a lot of sleeping tasks, maybe node is forking a lot? I have 277 taks on a 64 gb dedi with 30 vps, it seems strange to have 97 tasks on a single VPS. But this might be a wild guess.

Try to post me also iotop and htop (more powerful than top)

elie · July 22, 2015, 8:33pm

I can shut down the mongodb I think. The db I use is on Compose anyway. I just misconfigured mup when I first deployed the app.

It seems to be somewhat responsive at the moment, but last night it wasn’t and the app crashed at a point.

iotop is just 0s:

htop: