Contain health checks fail on Galaxy deploy

hey now we’ve been down for three hours. I’m in Japan so often I notice problems with Galaxy ahead of other regions.

Galaxy status mentions issues logging into Galaxy, but our actual app is dead. We are losing revenue. This is awful @marktrang

trqnw
2017-06-27 17:55:15+09:00The container is being stopped because it has failed too many health checks.
4knrq
2017-06-27 17:57:38+09:00The container is being stopped because it has failed too many health checks.
dm6a0
2017-06-27 18:08:15+09:00The container is being stopped because it has failed too many health checks.
x22wp
2017-06-27 18:09:52+09:00The container is being stopped because it has failed too many health checks.
80vz0
2017-06-27 18:18:25+09:00The container is being stopped because it has failed too many health checks.
6jxd7
2017-06-27 18:28:35+09:00The container is being stopped because it has failed too many health checks.
1hdt9
2017-06-27 18:29:18+09:00The container is being stopped because it has failed too many health checks.
nk6aw
2017-06-27 18:39:29+09:00The container is being stopped because it has failed too many health checks.
2h9tp
2017-06-27 18:42:33+09:00The container is being stopped because it has failed too many health checks.
q27yz
2017-06-27 18:49:38+09:00The container is being stopped because it has failed too many health checks.
9n7nz
2017-06-27 18:52:45+09:00The container is being stopped because it has failed too many health checks.
fgrm0
2017-06-27 18:59:49+09:00The container is being stopped because it has failed too many health checks.
k1nnt
2017-06-27 19:04:30+09:00The container is being stopped because Galaxy is replacing the machine it’s running on.
jsb45
2017-06-27 19:10:14+09:00The container is being stopped because it has failed too many health checks.
h1v1j
2017-06-27 19:14:58+09:00The container is being stopped because it has failed too many health checks.
b9vza
2017-06-27 19:20:16+09:00The container is being stopped because it has failed too many health checks.
gp7wf
2017-06-27 19:30:25+09:00The container is being stopped because it has failed too many health checks.
1pf5r
2017-06-27 19:40:39+09:00The container is being stopped because it has failed too many health checks.
8hrxj
2017-06-27 19:54:54+09:00The container is being stopped because it has failed too many health checks.

Having the same issue as @maxhodges here. Our site and app are down and all the logs are telling us is that “The container is being stopped because it has failed too many health checks.”

redeployed by it failed too.

v0239
2017-06-27 20:23:52+09:00Removing intermediate container 83da43b893f2
v0239
2017-06-27 20:23:52+09:00Successfully built 277ac5ad3101
v0239
2017-06-27 20:23:52+09:00Pushing image to Galaxy’s Docker registry.
v0239
2017-06-27 20:24:28+09:00Cleaning up.
v0239
2017-06-27 20:24:30+09:00Successfully built version 239.
sth6g
2017-06-27 20:25:21+09:00The container is being stopped because it has failed too many health checks.

Nothing works. No new deploy, kill container, activate/deactivate High Availability, etc…

same here. apps are down. Everything worked till last night and stopped today. Redeploying doesn’t help

017-06-27 10:46:28+02:00The container is being stopped because it has failed too many health checks.
ex7c0
2017-06-27 10:56:44+02:00The container is being stopped because it has failed too many health checks.
2w2g1
2017-06-27 11:09:53+02:00The container is being stopped because it has failed too many health checks.
vjz7x
2017-06-27 11:20:18+02:00The container is being stopped because it has failed too many health checks.
r2e0j
2017-06-27 11:39:06+02:00The container is being stopped because it has failed too many health checks.
7b758
2017-06-27 11:53:24+02:00The container is being stopped because it has failed too many health checks.
nkje5
2017-06-27 12:16:42+02:00The container is being stopped because it has failed too many health checks.
bf1nj
2017-06-27 12:30:05+02:00The container is being stopped because it has failed too many health checks.
ghh56
2017-06-27 12:57:36+02:00The container is being stopped because it has failed too many health checks.
xp1nd
2017-06-27 13:16:15+02:00The container is being stopped because it has failed too many health checks.
sgvhs
2017-06-27 13:26:43+02:00The container is being stopped because it has failed too many health checks.

same for me… can’t deploy tried about 5 times now. App down in production for almost 4 hours.

so what are the Galaxy alternatives these days IBM Bluemix still a viable alternative?

Meteor is aware of it now: http://status.meteor.com/incidents/tf630kbt1x2n

The Galaxy engineers are investigating the issue.

I’m not a part of the Galaxy team but from what I understand apps which have IP Whitelisting enabled are not experiencing this issue. As a suggested workaround, consider enabling that additional feature temporarily in the Galaxy dashboard.

There is no need to actually whitelist anything (and you shouldn’t if you intend on turning it off later), but just merely having the feature active should be enough to work around the problem.

2 Likes

Thanks @abernix. This helped us.

1 Like

our apps are back up now.

app is up but noticed that APM is not tracking. No new events (Japan time zone btw)

My site is still down…It’s been hours. Since 2:34am PST this sucks

My app is back, fingers crossed it doesn’t die again

Meteor APM is currently catching up on a backlog of APM stats. It will recover and stats were always being collected (though it’s worth pointing out that non high-availability apps which were down will not have stats to offer during their downtime).

Apps with existing “high-availability” deployments (those who pre-incident had their number of containers set to 3 or higher and were not affected by this outage), will have the Meteor APM data points aggregated and logged soon.

It is my understanding that all affected apps should be back online.

We have published a postmortem about this issue. http://status.meteor.com/incidents/tf630kbt1x2n

This issue is now back on galaxy meteor… our site is now on and off because of this issue. :frowning:

smw6v
2018-05-07 11:36:13+08:00The container is being stopped because it has failed too many health checks.
smw6v
2018-05-07 11:36:17+08:00Application exited with signal: terminated
rcnbk
2018-05-07 11:36:18+08:00Application process starting, version 43

Any possible fixes?

I’m having the same issue. Deploys randomly fail health checks or just keep stuck in deploying for a long time. Even deployments with exactly the same source will once be successful and another time not.

Literally nothing shows up in the logs.

Seriously guys, this is no state we can stay in for long!

Have you raised a Galaxy support ticket?

Hi folks, came across this old thread looking for more information on what constitutes a health check on Galaxy. Is anyone aware of more documentation / insight? Thanks!