Galaxy decided to restart my container with the following log message:
The container is being stopped because Galaxy is replacing the machine it's running on
Anyone know the reason(s) that would cause such an action. I only have one container at the moment so it causes service interruption for a few minutes which, admittedly, is only mildly annoying at this juncture for me.
EU upgrades happened earlier on. I am also on EU. US upgrade is happening right now. My containers don’t get replaced anymore. Yours should be fine too.
I was affected by this same issue this morning, but my container didn’t restart and it was the only container in the app. And it also was not crashing prior to this maintenance upgrade, it’s been running fine for weeks. My log has the The container is being stopped because Galaxy is replacing the machine it's running on. message, but then the container is sitting on yellow status.
When I visit my app using this container, it displays the following message in the browser:
502 Bad Gateway: Registered endpoints failed to handle the request.
And of course, I had critical users that were affected by this and my phone and email blew up thsi morning. Thanks Galaxy, you may have just cost me a really good customer. This was embarrassing.
See the image for the yellow container. After noticing the problem, I started two more containers just in case any of them went down again because of the upgrade, but they ran fine. I also have several other apps and containers in the same region and none of them failed to restart like this.
Happened to me just now, no notice, nothing. Just in the middle of using our own app and boom, lights go off:
The container is being stopped because Galaxy is replacing the machine it's running on
Thanks guys, no autoscaling but silently restarting our production system. I guess sooner or later we will take our apps off Galaxy and rather run them ourselves, as we dive more and more into managing our infrastructure with Terraform anyway!
An alternative to Galaxy for those who may be looking is NodeChef. In the rare case that we have to replace machines, our zero downtime deployment and migration feature ensures that no container or app is stopped. NodeChef coordinates code updates and migrations with precision to ensure a seamless upgrade/migration experience for all connected clients.
Bumping this again. I still see this message all the time.
The container is being stopped because Galaxy is replacing the machine it's running on.
Is it whenever Galaxy gets a new version or something? I run a lot of containers so I see this often. Usually at least once a week. The hiccup that occurs when this happens causes connected clients to miss DDP calls and messes up my real-time gaming app.
@filipenevola What’s the reason this happens? And can it be reduced?