What happens when Platform automatically restarts Node for you?

sarojmoh1 · December 21, 2021, 5:31pm

I have an app that’s built to straight Node running on a single t2.medium instance without any auto-scaling on Elastic Beanstalk.

I’ve gotten some Cloudwatch warnings recently that read like
Message: Environment health has transitioned from Ok to Severe. 53.9 % of the requests are failing with HTTP 5xx.

This then transitions back to Ok after about a minute.

I think this is due to a certain expensive pub/sub open Websocket.

Upon inspecting the logs though, I see lines like the following in the minute where the health transitions from ok->severe->ok

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! app@1.0.0 start: `node main.js`
npm ERR! Exit status 1
npm ERR!
app@1.0.0 start /var/app/current
node main.js

I’m reading this as Node actually crashes and the EB automatically restarts Node.

My question is, what does the user actually experience during this restart? Their connection to the actual app server should be down (whether or not they notice), right? Would the reverse-proxy still serve some cached content?

As far as ultimate solutions go, what is recommended in this case?
I’m planning on refactoring the expensive pub/sub soon, and I anticipate that can alleviate the issue.
Would upgrading from the t2.medium be a quick patch for now too? We’re not ready to introduce the auto-scaling yet as lots of testing has to be done there and it’s probably not needed.

Thanks!

rjdavid · December 21, 2021, 6:37pm

Run your app locally and shutdown your local server. Then access your app. That should answer this question.

Figure out why node was restarted and then solve the cause. It’s difficult to assume a solution without knowing the root of the problem

Why do you think it is not needed?

sarojmoh1 · December 21, 2021, 7:15pm

All makes sense

The error is same one I’ve described here Node call stack exceeded from Mongo bulkOperation - #2 by rjdavid and Beef up Meteor/Mongo to handle expensive bulkUpsert with concurrent active Sessions - #12 by truedon

in which I still haven’t found a solution for. (Granted the nightly sync seems fine upon me killing all active connections prior to it)

I’m still extremely sold that it’s due to a concurrent bug bulk operation being executed while a simultaneous expensive pub/sub with same Docs being processed are open. It’d be great if I could reproduce locally and then solve it. Otherwise, I feel like I still have to refactor that sub.

I think the autoscaling might not be advantageous to us just given the size and amount of users of the app. I could be completely wrong though.

truedon · December 22, 2021, 3:09am

Bro that’s for sure, you cannot do this inside meteor. You do all bulk external in just a node script is ideal. Meteor is best when it’s light lifting, not heavy data processing.

Setup a seperate repo just for data processing, do all your bulk work there. Run it once a day during the low time on cron.

Keep sub/pub light, if you have many calculations you can just do that in a script and save it into a collection just for that. Or more create a view, which Mongo does support for pipeline aggregation.

https://docs.mongodb.com/manual/reference/method/db.createView/

sarojmoh1 · December 22, 2021, 2:43pm

Sure, I hear ya, seems like that’s the way to go now.

Just looking for quick patch to apply until I’m able to refactor though.

Would upgrading resources (ram, cpu) even do anything regarding the stack overflow error though?

If the issue is with the observer, then can updating resources really do anything, or is this all just with Node and its stack size?

truedon · December 22, 2021, 3:16pm

You can just add swap to see if it’s ram. It’s a only a couple commands to create one

https://www.howtogeek.com/455981/how-to-create-a-swap-file-on-linux/