Galaxy container restarts - what causes them?

dthwaite · May 11, 2017, 6:25am

Galaxy decided to restart my container with the following log message:

The container is being stopped because Galaxy is replacing the machine it's running on

Anyone know the reason(s) that would cause such an action. I only have one container at the moment so it causes service interruption for a few minutes which, admittedly, is only mildly annoying at this juncture for me.

martineboh · May 11, 2017, 8:20am

Galaxy is currently undergoing infrastructure upgrade. So your containers might restart a few times. Take a look at http://status.meteor.com

dthwaite · May 11, 2017, 8:33am

OK, thanks. This update seems to only apply to the US and I’m using EU - but I assume you are doing upgrades in EU as well?

martineboh · May 11, 2017, 8:42am

EU upgrades happened earlier on. I am also on EU. US upgrade is happening right now. My containers don’t get replaced anymore. Yours should be fine too.

dthwaite · May 11, 2017, 8:44am

Cool. I’ve subscribed to galaxy updates now!

evolross · May 11, 2017, 4:51pm

I was affected by this same issue this morning, but my container didn’t restart and it was the only container in the app. And it also was not crashing prior to this maintenance upgrade, it’s been running fine for weeks. My log has the The container is being stopped because Galaxy is replacing the machine it's running on. message, but then the container is sitting on yellow status.

When I visit my app using this container, it displays the following message in the browser:

502 Bad Gateway: Registered endpoints failed to handle the request.

And of course, I had critical users that were affected by this and my phone and email blew up thsi morning. Thanks Galaxy, you may have just cost me a really good customer. This was embarrassing.

See the image for the yellow container. After noticing the problem, I started two more containers just in case any of them went down again because of the upgrade, but they ran fine. I also have several other apps and containers in the same region and none of them failed to restart like this.

a4xrbj1 · April 21, 2018, 4:11am

Happened to me just now, no notice, nothing. Just in the middle of using our own app and boom, lights go off:

The container is being stopped because Galaxy is replacing the machine it's running on

Thanks guys, no autoscaling but silently restarting our production system. I guess sooner or later we will take our apps off Galaxy and rather run them ourselves, as we dive more and more into managing our infrastructure with Terraform anyway!

hemalr87 · April 22, 2018, 10:12am

Annoying to hear. It’s happened to me a few times - no explanation/errors. Support didn’t help and said it must be an infinite loop in my app

Looking at alternatives but probably a couple of months off before it ends up at the top of the priority list

knana · April 22, 2018, 11:15am

An alternative to Galaxy for those who may be looking is NodeChef. In the rare case that we have to replace machines, our zero downtime deployment and migration feature ensures that no container or app is stopped. NodeChef coordinates code updates and migrations with precision to ensure a seamless upgrade/migration experience for all connected clients.

evolross · December 8, 2020, 8:43pm

Bumping this again. I still see this message all the time.

The container is being stopped because Galaxy is replacing the machine it's running on.

Is it whenever Galaxy gets a new version or something? I run a lot of containers so I see this often. Usually at least once a week. The hiccup that occurs when this happens causes connected clients to miss DDP calls and messes up my real-time gaming app.

@filipenevola What’s the reason this happens? And can it be reduced?

evolross · December 16, 2020, 2:14am

Here’s a screenshot from my Galaxy “Service” logs today…

I’m getting about one Galaxy restart per day! What’s going on with this? Is it normal? @filipenevola