Diagnosing high cpu?

My app doesn’t have too many concurrent users (maybe 20 max) but and usually the cpu usage is low (~15%) but sometimes the cpu will go very high (60-70%) and then not come down for a while.

I have Kadira APM but its hard to tell the causes - all my subs show high wait times and high observeChanges times (2-3s), e.g. -

coll : doctypes
selector : {"_id":{"$in":["RrGFesNqrX2RxZg25","8EMKYEaPZNAaiE7qb","ucPzrYJFvhXJFZbRK","fa5PNTPSu6v2XQHEv","2TXRPYFXmvg77KzxL","pPNXsrH8uvug3Hwbs","Nk4oxGJdvGT3Yropn","i8CgydDJMPNFKMyWu","k6g5vF8f9pmPXHXDk","7XBhXx3aF9e8iYFmJ","tPiC6KeAWKoGxAtKu","HMnDFX82FPYb7Mjp5","NTZqwWxhQe2oN2MSN","NonQP3pgqF8XS77Kq","W63bo7zJEPzgzyqKQ","DbT9ce7tRtatiuMNk","eDZWkFepXWd3vRnwo","jffvyNWepApmvdcjo","sESHyB249GAi5i6C6","jN4SF2kuY9GKf8HPS","sZqQTqWhz9WXsPZDD","jsT7mmsDMLwFf2ToF","QjcGTsWi4GF9c6d2j","pmKqrqfgZ4Sym4Tdk","yih8cnspgTND62BKc","omngBPmz3AyHNoRTb","MMrvYafWsB2PyeKxu","ancSkozjjPPWqBEGa","bBniMMCQsNqqhjmZG","N9ruiWCkGsiKgHKaW","C9sd2P5arMuHYNeg9","NjXoknWtsP3DCKFW4"]}}
func : observeChanges
cursor : true
oplog : true
wasMultiplexerReady : false
queueLength : 0
elapsedPollingTime : 0
noOfCachedDocs : 200

All the subs/queries are using oplog.

I cannot tell if the high cpu is due to meteor doing more work and the queries being slow, or vice versa. I also looked at my mongodb and its in single digit cpu’s with no load on it.

What can I do to debug this?

How to config your server? Mannually ?

You will probably need to do some profiling. Kadira APM used to have this built in so if you can still do that it would be best.

You could also use https://github.com/node-inspector/v8-profiler

Warning for Meteor 1.6 users: last time I looked they were having some problems with Node 8 and v8-profiler npm

Did you find anything to help diagnose the problem?

I’ve just landed in a similar situation. Everything has been running fine for a month or more and now today I’m getting spikes every so often (4 times in the last hour) at 100% CPU, lasting for about 30-40 seconds. The mlab diagnostics show nothing unusual - nothing that coincides anyway, the odd page fault here and there. And kadira just shows that a method took a long long time, not that it went into an infinite loop or anything, so a method that normally takes 2000ms is taking 120000ms !!! :disappointed_relieved:

I can’t confirm this but it did seem like a new connection coincided with the spikes. We’re only talking 15-20 connections total though, so hardly anything too stressful.

Any holistic ideas on how to see what the problem is would be much appreciated.