Server CPU Spikes due to slow pub/sub and database queries

#1

Hi!
My app is in production with a large number of users and database. In peek hours the session grows to 2K and app used to be able to handle it with 60% avg CPU.

But from last 3-4 days the server is started crashing in peek hours. It works smoothly for 2-3 hours with 2K+ sessions, but suddenly CPU crosses 100% and in next 4-5 minutes app stops responding or takes long time to serve then normal and then server crashes and restarts with equal CPU/memory/methods/pub&sub response time and consumption and then come back to normal in next approx 1-2 hours or I reboot the server to .

I tried to figure-out the problem but not able to get any clue to check why these crashes are happening.

Because I had upgraded mongodb from v3.2 to v4.0 but 1 week ago, so I think the problem may be because of DB. According to graph DB queries are taking time to fetch data and it increases the methods/pub&sub response time. which also increases the wait time. (Index may not be the issue because the app used to work previously.)

This is the screenshot of 3 hours data from APM.

Publications response time

Methods related graph

2 Likes
#2

Some people are experiencing a performance hit on later versions of meteor/mongo

Which version of Meteor are you running? If you have an old meteor version it might not like the upgrade to mongo 4

1 Like
#3

Hi @jorgeer thanks for showing your interest.

I’m using meteor v1.8. I had upgraded meteor 3 weeks ago from v1.6.

#4

Are you sure indexes on the db aren’t the problem? Do you already have any in place? Your scenario and the graphs look a lot like the first scaling issue presented here: https://medium.freecodecamp.org/scaling-meteor-a-year-on-26ee37588e4b

#5

That looks familiar; are you using observeChanges? It has a memory leak, see: Meteor perfomance issues with collection observeChanges

If you’re using that, you have no choice than to rewrite your code to polling methods. Alternatively, you can subscribe to collection changes only, and then within the autorun fetch collection data using methods.

You’ll find that your app performs much better, and CPU will stay down forever.

Edit: also check this Memory Leak - App crash every day