Our galaxy apps are down, cannot restart after maintenance

@maxhodges posted about this already here

We have 3 apps on galaxy. 2 are running (previous versions), 1 is down. We have tried redeploying 2, neither of the new deployments will start. It looks like the recent Galaxy maintenance killed our one of our apps and stopped us being able to deploy. It’s been ~4 hours now. We’ve contacted Galaxy support, no response yet.

Any recommendations on easy Galaxy alternatives that we could deploy against our existing mongo infrastructure in us-east?

1 Like

Hopefully somebody get to work early. Don’t really want to wait until 11am EST for them to get to work and start looking at this.

I’m working to get MUP up and running on a Digital Ocean droplet, that seemed to be a popular way for people to spin up lower cost meteor apps.

If I could save money, AND have a reliable server than that would make me and my clients pretty satisfied.

I posted in the other thread (linked above), but this thread is probably more appropriate so I’ll quote my other post here:

The Galaxy engineers are investigating the issue.

I’m not a part of the Galaxy team but from what I understand apps which have IP Whitelisting enabled are not experiencing this issue. As a suggested workaround, consider enabling that additional feature temporarily in the Galaxy dashboard.

There is no need to actually whitelist anything (and you shouldn’t if you intend on turning it off later), but just merely having the feature active should be enough to work around the problem.

I would suggest following the status page for updates!

@abernix Thanks for the update, I can confirm that fixed the issue for me, luckily I saw this right before I shifted over our DNS settings.

Enabling IP Whitelisting appears to have fixed the issue on our installs also. Thanks @abernix.

Glad the IP Whitelisting workaround option has got you back online.

Just to be clear, this doesn’t actually have anything to do with IP whitelisting, it’s just that the separate infrastructure which supports IP whitelisting deployments is not suffering from the same problem so enabling that option fires your application up on that infrastructure. The AWS eu-west-1 and ap-southeast-2 infrastructure is also operating normally.

Galaxy engineers are working toward an appropriate solution and you can monitor the status page for updates as to when you can disable whitelisting again. When you disable whitelisting, your app will redeploy into the normal infrastructure.

Just for completeness – as I posted in the other thread:

Meteor APM is currently catching up on a backlog of APM stats. It will recover and stats were always being collected (though it’s worth pointing out that non high-availability apps which were down will not have stats to offer during their downtime).

Apps with existing “high-availability” deployments (those who pre-incident had their number of containers set to 3 or higher and were not affected by this outage), will have the Meteor APM data points aggregated and logged soon.

It is my understanding that all affected apps should be back online.