A couple of times a year, we do see a couple of node processes start taking more memory at the same time for no obvious reason.
Last time it happened, RSS went up to the roof (8gb) and then the garbage collector kicks-in and CPU goes up to 200% and the process crashes. We did get help from @filipenevola who told us to add console.logs for each and every subscribe and method. Which we did.
We do have a setInterval that displays the CPU and RSS in the console every 15 seconds:
At 6:05:50 : CPU = 50% and RSS = 300mb
At 6:08:18: CPU 102% and RSS = 2120mb
See, there’s nothing in between so instead of running every 15 seconds, it took 2 minutes to run the next one so something happened during that 15 seconds that made the server “unresponsive”. but we have no log of any method or subscribe even though we do log them all.
The problem is we had no log:
-Nothing in the console
-Nothing in MontiAPM
So either no method/sub is ran or something that does not do console log output runs and takes up all the ressources.
But, this morning, we saw only one event happen just before that 15 seconds:
onNewWebsocketConnection: id=Pv7Qu9g8QrwjozEKq clientAddress=184.108.40.206
So we investigated that IP in our haproxy logs and we saw it made a lot of retries to connect to the websocket (unsucessful) by a Cordova user on that IP:
Then, it seems to fallback to xhr:
A couple hundred of those events in the same minute seems to make RSS increase like crazy and make the node process unresponsive.
What would you suggest ?
Should we :
1-Implement haproxy rate limiting on those 2 http requests ?
2-Block xhr (which seem so slow our app does not work)
3-just find a way in node to log those in the console so we can manually block the initiating IP?
4-Any other suggestion ?