My app is in production with a large number of users and database. In peek hours the session grows to 2K and app used to be able to handle it with 60% avg CPU.
But from last 3-4 days the server is started crashing in peek hours. It works smoothly for 2-3 hours with 2K+ sessions, but suddenly CPU crosses 100% and in next 4-5 minutes app stops responding or takes long time to serve then normal and then server crashes and restarts with equal CPU/memory/methods/pub&sub response time and consumption and then come back to normal in next approx 1-2 hours or I reboot the server to .
I tried to figure-out the problem but not able to get any clue to check why these crashes are happening.
Because I had upgraded mongodb from v3.2 to v4.0 but 1 week ago, so I think the problem may be because of DB. According to graph DB queries are taking time to fetch data and it increases the methods/pub&sub response time. which also increases the wait time. (Index may not be the issue because the app used to work previously.)
This is the screenshot of 3 hours data from APM.
Publications response time
Methods related graph