MongoDb backups, expert advice?


#1

Probably 98% of people here buy their database as a service from Compose.io or mLab, but I ended up running my own replica set.

I’ve so far resorted to server snapshots for backups, but I’m now working on setting up proper continuous backups.

Any experience here on the issue?

My current strategy is to include a hidden, non-voting member to my replica set, so taking backups won’t slow the database down for my users. Since I’m running Mongo 3.2 with wiredTiger engine, I believe I could just do mongodumps on the hidden member with --oplog setting and have a script to push them to S3 for storage every night.

Does this sound like a good idea?

Bonus questions:

1) Are there any tools or battle-tested scripts to help achieve this?
2) I was also interested in automating backup verification, any ideas?
3) Any backup related horror stories about MongoDb to share?


#2

What I’ve found so far:

(I excluded everything that a. didn’t seem solid b. didn’t support --oplog)


Mondodb-backup https://github.com/hex7c0/mongodb-backup

This is actually not just a mongodump wrapper, but will actually use the Mongodb-driver to query documents from database. Not sure if this qualifies rule b).

I’d be very interested to hear opinions or experiences from using this tool, compared to mongodump-based solutions.

return db.collections(function(err, collections) {

    if (err) {
      return next(err);
    }

    var last = ~~collections.length, index = 0;
    if (last === 0) { // empty set
      return next(null);
    }

    collections.forEach(function(collection) {
    ...

PerconaLab MongoDB Consistent Backup Tool https://github.com/Percona-Lab/mongodb_consistent_backup

Seems like a pretty robust solution, supports --oplog, uploading to S3 and even consistent backups from shards (which we of course don’t need).


A couple of nice shell scripts that wrap mongodump and do trivial tasks, such as


https://github.com/micahwedemeyer/automongobackup (says it’s not under “heavy” development)


And then we have this, mongo-backup from Kontena https://github.com/kontena/mongo-backup

This is a Docker container that users some Ruby gems that seem quite production-ready for backups, has hooks for Slack, upload stuff to S3 and sends you emails. Plus it supports --oplog.


I think I’m first going to try the last one from Kontena, just deploy a new hidden non-voting replica set member, install Docker and run this container. Let’s see how it goes.


#3

This is useful stuff - please keep us posted :slight_smile:


#4

I ended up picking the Docker container solution by Kontena, mostly because it had everything I wanted, neatly encapsulated in a container.

Customizing it for specific needs shouldn’t take long, so I wouldn’t be too worried that my pull request for fixing a typo in the options parameters has been waiting for nearly solid three weeks…

The main thing is, my backups are taken daily, I get reports to Slack and they are uploaded to S3 automatically.

Job done, except for verifying the backups, which I occasionally do manually for now.


#5

Is there any specific reason you chose this route instead of shelling out $50 to a compose or Mlab? Seems like you’re spending a lot of time and effort for something you could buy for $50. It’s like trying to build your own SUV from scratch instead of leasing a mercedes G wagon


#6

There’s a few reasons.

  • The leased Mercedes might have to be parked quite far away from my house, so I have a long walk just to get to the car every time I need to drive it. This would cause a performace hit.
  • The leased Mercedes has to get fixed and suffers from issues from time to time. The last incident on Compose for example was 27th of December, which was yesterday. The one before that was 24th. And so on. My friend once had a leased Mercedes that spent more time in the garage then on the freeway! :slight_smile:
  • Unless my app is a complete failure, I will probably quite quickly outgrow the “$50 Mercedes lease” and find myself with a very expensive contract. After all, $50 per month is still $600 per year, how much is it when I need five times the horse power?
  • It took me two hours to set up a three member replica set, one of them delayed (in case of human errors or other catastrophe, I don’t know if the leased Mercedes supports that, but you can think of it as a “DIY SUV with a spare tyre”) with primary members in the same data center as my app server, connected via private network VPN, so it’s not too much of a burden, really. Just setting up a new account and studying what [insert favorite DBaaS or car make here] offers and what it doesn’t felt more daunting at the time.
  • The leased Mercedess might also have some issues you would not have encountered if you handled things by yourself: I recall a lot of people had trouble getting Compose.io to work with Meteor at some point, don’t know if they got it fixed (related to something like Compose classic product switched over to wiredTiger engine. Something funky with the way the configs had to be given in.)
  • As a full stack developer, I love to learn new things & improve my skillset and knowledge. Studying (and implementing) this tiny feature of automatic backups with MongoDB might benefit me in various different challenges in the future, not even necessarily related to MongoDB. If I just lease everything, I never get to know how.
  • Setting up the backups took me approximately 7 minutes, though I spent an entire afternoon studying the different ways people are already doing it out of curiosity
  • Scaling is now quite cheap
  • I have one really bad experience using the specific “Mercedes DBaaS” model, especially regarding automatic backups. They promise you “point in time recovery”, but once you need it, all you might get is a popup box saying “operation failed.” and that’s it.
  • Usually the costs start adding up. Sign lots of lease contracts, and soon you’ll find you’re spending quite a lot of money on stuff you could get for nickels, especially in the long run :slight_smile:
  • I guess I’m old fashioned in some ways. I would also love to learn how to build a car.
  • I can say "Look mom, I made this!"

The most important reasons are emphasized in Bold.


#7

The reason people don’t often worry about Mongo (i.e. for disaster recovery) backups is that production-grade apps run in clusters. So it’s easy to set up replication. If one server goes down, others are running and have live backups of your data. You’d better run and launch a new Mongo server to replace the one that broke.

If you want a single snapshot (i.e. to restore deleted stuff or for historical reasons), just run mongodump periodically and have a set of rotation rules set up (e.g. at least 1 per day for last week, 1 per week for last 8 weeks 1 per month for last 2 years, 1 per year)


#8

@ramez, you didn’t quite understand what’s the point in the original post and how things have evolved since.

I did want periodical, automatic mongodumps, but I was looking for the best way to implement them.

Also, backups is not same as running a replica set (or Mongo cluster, as you said it). Replica set provides fail over and and high availability, but not backups. Imagine if a programming error or a hacker gets to run malicious queries and deleting data, it’s going to get replicated across your other members instantly. Bye bye data.

Anyway, I already have nightly snapshots taken of my three member replica set and all is fine, @a.com was just asking why I didn’t opt in for a paid Compose subscription and instead decided to build my own Mongo infrastructure.

And I tried to exlain those reasons.


#9

Answer below from prior post, relevant part in bold.

From your Mercedes post above it seems you are ok re-inventing the wheel. What myself and others are trying to describe is how production systems with proper backups work.

There are three needs:

  1. High Availability and continuous backups – replicas
  2. Data restore – rotation backups
  3. Disaster recovery – VM images / build book + rotation backups

#10

There are three needs:

High Availability and continuous backups – replicas
Data restore – rotation backups
Disaster recovery – VM images / build book + rotation backups

I do not understand why you’re posting this :slight_smile: Anyway, good luck with backups, hope you don’t need them!


#11

@arggh I believe I have properly addressed your questions with how best-in-class systems deal with backups / VHA. If you need references on that I’ll be happy to pull some documents for you so you can read up more.


#12

Yes and I appreciate that, but nobody was asking how best-in-class systems deal with backups. I already had a replica set and I was already taking snapshots. I was merely asking for a nice tool to take care of automated continuous backups. Do you have a specific tool in mind you’d like to recommend?


#14

You need a tool to do backups with rotations? I have a perl script we developed.

That was the whole point. With a trivial problem, somebody usually has come up with a nice elegant solution already. You wrote your own Perl script and that’s nice, but I thought I don’t have to.

So now I have a Docker container that takes the periodic backups, uploads them to S3 and notifies me via Slack. And I didn’t need to write any scripts :slight_smile:

Though I would like to see your Perl script, is it public?


#15

Bash script triggered via CRON, calling mongodump, and storing only latest 7 days (rotation) at GitHub - Read Tutorial


#16

Why in gods name would you spend thousands of dollars in your time setting up and managing mongoDB (and researching questions like this thread), instead of just paying MLab 15-100 bucks a month? Even if you did save $100 bucks a month over MLab, is it worth the frustration? I can’t image the skills to run your own mongo from scratch are very marketable (unless you want a job at Mlab)


#17

I’m amazed at the ever-lasting popularity of this thread! Well, I did explain some reasons in the post above:

Regarding cost, if I had been running my exact setup (production + staging) on Compose or mLab, it would have cost me approximately $100 per month.

My servers have now been running two years, which would have cost me 24 x $100 = $2400

Instead, I deployed my own replica set, which costs me ~15 $ per month, totaling in at 24 x $15 = $360

For a project that might not necessarily make money (yet or never), that’s a considerable difference.

The amount of time spent settings up and managing stuff so far:


  • Setting up the replica set ~ 1hour
  • Upgrading MongoDB version to -> 3.2 ~ 1 hour
  • Writing this forum post ~ 0.2 hours
  • Setting up automated backups 0.1 hours
  • Setting up automated crash warning, which posts to our Slack alerts channel in case our MongoDB goes down ~ 0.1 hours (4 line bash script)

Total time spent 2.4 hours
Total hourly salary approximately $850.


Also, there are lots of other benefits to this approach, such as:

  • The primary db-nodes are in the same data center as my app servers, meaning lag is virtually non-existent
  • I’m not dependant on a third party service…
  • who also has their own problems, which I would probably have had to work with (it’s never a setup & forget scenario)
  • I’m not entrusting my data to a third party, who might screw up their backups. Remember case GitLab?
  • When the app grows, the costs go through the roof
  • Learning new stuff = investing in myself

Sure, anybody who has an email and basic reading skills can setup an account in mLab and copy-paste the given URL to Galaxy, but if you happen to work as a developer and your possible future employer asks about your experience with MongoDB, that doesn’t help you much.

I completely understand that for some people with certain skills it’s a no-brainer to pick mLab or Compose or whatever, but I hope you understand that some people don’t, and in my opinion, for valid reasons :slight_smile:


#18

I just realized you were posting the same comments earlier on in this same thread, which I responded to already! :rofl:


#19

“Good programmers write good code; great programmers steal great code”


#20

Great programmers - Fork great code.