Anybody else seeing Mongo failures on meteor.com hosted apps?

tl;dr: hosting a ton of apps is easy. hosting a ton of Mongo databases is hard.

Hi folks! Just wanted to give a little background as to what’s going on with the free meteor deploy hosting. (For context, I’m an MDG engineer who’s been with the company since a few months after the initial public release, and these days I’m mostly focused on making Galaxy’s backend reliable and stable.)

As you can now see on status.meteor.com, there are now intermittent outages on our free deploy service. (I’m sorry it took longer than necessary to get that message up!)

We’ve offered a free deployment service via meteor deploy since the initial public launch of Meteor in April 2012. (Note that this is a completely separate codebase from Galaxy.)

Over that 3.5+ years, the core “run and serve apps” functionality of the free deploy service has been incredibly reliable. It’s nowhere near as full featured and flexible as Galaxy’s backend, but it works pretty well.

But as folks are observing here, overall the free deploy service is not what could really be described as “incredibly reliable”. Why is that?

The free deploy service gives every deployed app its own Mongo database. This makes using meteor deploy super easy to use — no database setup needed!

But unfortunately, it turns out that MongoDB is not actually designed to run huge numbers of databases from a small number of services. (Especially the versions of MongoDB that existed a few years ago.) We’re talking things like: if you type show dbs in the mongo shell, it would bring down the entire server for minutes.

We originally used a hosted service to run our databases, but we found that our providers weren’t able to provide an enormous number of small databases in a stable and affordable way. (Plus, sometimes our well-meaning providers would try to debug issues using their own nifty homespun tools… which did things like type show dbs behind the scene. Oops!)

We switched to running our own Mongo clusters for these apps, which worked fine for a while. But over time, our clusters have run into more and more problems.

Worse, we’ve been unable to upgrade some of the clusters to newer, more stable versions of Mongo. Why? Well, recent versions of Mongo have become more and more strict about allowing various forms of invalid data into them. This is a good thing! But it also means that if you have an existing cluster with invalid data and you want to upgrade to a stricter version of Mongo, then you must personally repair all of the invalid data on every single database before you can get it running with the new version.

When you’re running a few dozen databases that are data for your own apps, this is a pain but doable.

But in our cases, the broken data is user data. We don’t even want to be looking at your private databases, let alone editing them to resolve invalid data in a way that we might guess might be what you meant when you wrote your code.

Since we expect the old codebase that runs the free deploy service to eventually go away once we have a comparable replacement as part of the more modern Galaxy, we made the choice to leave the clusters running old versions of Mongo… which has not exactly helped the stability situation.

We’ve learned our lesson. Right now Galaxy is a “bring your own database” system, and all of these issues that have plagued the free deploy service have been pleasantly absent. It’s a bit of a pain that you can’t just deploy and have a database magically set up… and I hope at some point we are able to offer automatic database provisionment as an option, even for free accounts. But if we do that, we’ll learn from our experience here, and do our best to avoid a situation where tens of thousands of databases end up sharing the same cluster.

What are we trying to do now? Well, a few things:

  • We’re actively working on repairing the clusters that are having problems now, and spreading out their load.
  • We’re reducing the number of mostly-unused databases by deleting apps that haven’t been visited or deployed to in a while, as recently announced.
  • We’re getting Galaxy closer and closer to a place where it can replace the old codebase that runs the free deploy service. (BTW, please don’t take this as any sort of promise about there definitely being a free Galaxy level someday — I’m the wrong person to ask about that. But it’s certainly the case that we know how much people like the free meteor deploy service when it works!)

I’m definitely sorry that people are having trouble using our free deploy service now. While we’ve never encouraged people to use it for serious business production apps, it’s a super useful tool for lots of other purposes, and I hope we can both fix the current implementation and improve our offerings for the future.

18 Likes

That makes sense. Thanks for taking the time to provide such a detailed explanation.

Fantastic response, far more detailed than I could have ever expected. Thank you @glasser!

I do still think that MDG management could have taken some responsibility to update the community given that they must have known there were problems. As @mrlowe pointed out, no-one was up in arms, but the lack of response didn’t reflect well, especially compared to the more open and responsive engagement we’ve seen recently.

Still, it’s good to know it’s being worked on, and as a test & dev tier ups and downs are to be expected, so thanks for your keeping us posted.

PS. I hear CouchDB works well for multiple database? Ducks!

Why was there no error message on http://status.meteor.com during the days this has been an issue for many people here on the forum?

We absolutely should have put up a message on status.meteor.com message earlier than we did.

The status page is a new thing for MDG, especially for the folks who support the free hosting service. We are still getting used to it. We will try to do better in the future.

In this case, communication was particularly difficult because the degradation was slow. It didn’t affect every account, and a lot of our testing made it look the issues were more sporadic than they actually were. As I mentioned, this was primarily caused by the free tier’s Mongo use patterns — Mongo got slow, then more slow. At some point, it got so slow that this became an incident, but by that point, we were knee-deep in trying to fix it. We’ve learned from this and will in the future err on the side of posting to the status page more readily. And of course we will continue to develop Galaxy in such as a way that we don’t end up with “situation normal, Mongo broken” over there.

We are still working on getting things on the free deploy service to a healthier state — some improvements have already been made to newer apps and deploys, but we are far, far from done and resolved on this.

4 Likes

It’s likely I’m blind, but I can’t see a link to status.meteor.com anywhere on the Meteor.com site?

Not sure where I found it. Almost every business has one so it’s kinda standard url for it.

I’ve pulled a lot of hair out trying to figure out why my app wasn’t working until I found this thread. My site is hosted on Meteor’s free tier, but under a mysite.com name as opposed to mysite.meteor.com. With the trouble I’ve been having with mysite.com, I decided to deploy the same exact code to mysite.meteor.com, and it works fine. I’m figuring it’s because mysite.meteor.com is running on an updated cluster since its new.

So I’m wondering if it would be possible to remove mysite.com completely, then re-create it and re-deploy to it? Would that put it on a new cluster? If so, is there a way to download my data before doing so, in order to upload it to the new cluster? I’ve looked at all of the clone/download meteor mongo db threads and haven’t been able to make any of the solutions work. I believe it’s because I am trying to get data from mysite.com, as opposed to mysite.meteor.com. It could be, though, that it’s because of the mongodb troubles.

Any advice/comments would be appreciated. Thx.

You can use the meteor mongo mysite.com --url command to get the credentials to access the mongodb instance for mysite.com and then just mongodump your database. That dump you can later use with mongorestore.