I'm Done With MongoDB... It's Leaving My Stack

People shying away from mongo in favor of arguably more reliable database platforms often don’t have the slightest clue about what can go wrong with those databases.

It is always assumed if it’s sql, it is reliable.

On the contrary, like every other devops task, maintaining a database server is a daunting task, whether or not it is ACID compliant. They do fail and when they do, they fail very miserably.

The most important point everyone is overlooking is the availability paradigms introduced and/or leveraged by these nosql systems like mongodb.

For example, I don’t care about postgresql scaling, most apps will never have scalability requirements, but what they do have from day one is high availability. And there enter us the territory of replication nightmares. Mongodb solves that problem and solves it rather well. I’d rather let go of a couple of database transactions/operations than lose the entire application over a corrupt database instance. And if designed cleverly, any app can recover from some data loss. There are patterns for ensuring this, regardless of the underlying database technology.

Now I’m not saying let’s throw away sql. I’m not in a camp, any camp for that matter. I think every job requires its own set of tools, but for any app job to actually require a specific set of tools, it should foreseeably be bound to grow in scale. And that’s a very unlikely scenario for 99.999% of the apps out there. If your app fits that top-notch percentage, I’m sure you’ll figure it out using the millions of venture $$$$ they’re throwing at you.

Sql has its place with strict transactional requirements like finances where damage may be irrecoverable. And mind you, I don’t actually place ecommerce in that realm. A lost order is always recoverable. (Ok, there of course are cases where transactions are favorable, even irreplacible for ecommerce) My point is, we are usually over-sensitive about our data and overlook our actual requirements.

In my opinion, meteor saves us enough development time, we can spare some time to develop application-level transactional capabilities where it does matter in the app. Otherwise, the cost-benefit of the isomporphic api, imho, far outweighs the vaguely-potential loss incurred by data discrepancy, should it ever occur.

11 Likes

Sure. All of my clients so far are startups with greenfield projects and so far have been in the SF bay area. The issue comes up when they ask what i’ll be using to build their app, then they have a mentor or friend go over all the stack as well as how/where it will be hosted, etc…

It usually goes something like “well my friend is a programmer and he said X & Y about MongoDB and that it’s risky to use it”. From there it’s usually hard to convince them that Mongo will be ok for their app.

+1 :thumbsup: I also don’t think that Rethink is a panacea either… however for document databases it seems to have one of the best feature sets.

I also think a lot of the potential Mongo problems can be avoided by using a good database hosting service.

I totally agree with this. A lot more can be done at the app level. For me the isomorphic API for the database is nice (and I would love to have ReQL in the browser!) but it wouldn’t trump better data consistency if the app needed it (a photo sharing site most likely does not!).

Checkout this video https://www.youtube.com/watch?v=05R-TDP0Ltc for a preview of RethinkDB for Meteor. The package is not production ready but it’s really slick! It even has a partial clientside mini-rethink. I’m currently using a rethink driver with the MDG promises polyfill. You can also use Futures if needed. I’m not sure how stable the meteor-rethink package is for just the server but from light testing it seems to be stable (it’s mostly just wrapping async methods with fibers and monkey patching).

1 Like

Okay. I’m not trying to judge your decision, but to say some stuff about Mongo.

At Kadira, we started to use Mongo for storing metrics because that’s the DB we are well known of. We knew SQL, but they are very hard to maintain, both in app level and infrstruture level.

So, Mongo was (is) a good tool.

Leter on we realized, with the heavy load we need to move to a new system. So, we started to build a custom solution for our needs. That’s not an issue of Mongo, but we needed a better solution for our needs.

Managing Mongo ReplicaSet it quiet trivial if you follow some basics. We learnt some stuff like giving enough room for the oplog.

Now, we still use Mongo parrallel with our new data setup and we keep it at least another year. So, recently we started to Shard mongo in our own way. And it worked pretty well.

To me, Mongo is a pretty good General Purpose DB. It does best what it promised. Of course, every program has bugs, so does Mongo.

6 Likes

Interesting read (although the comments section is dumber than usual). I still don’t really see the problem with Mongo after this. Since most Meteor apps are using websockets, a 2-second dirty or stale read isn’t a big deal (whereas it could be for long-polling + REST API). The biggest problem occurred during arbitration (picking a new primary after a network partition) where two mongos thought they were primary until the vote returned. Please correct me if I missed something in that article, but if a client application can’t handle this pretty darn rare edge case, then they can probably afford to buy a big ol SQL server. Remember, Mongo was designed for sharding & replicating data across a bunch of budget machines instead of 1 super-server, there’s tradeoffs to be had… but convincing the client is another beast all together :blush:

1 Like

The paradigm of MongoDB is actually quite simple: You only get atomic operations for writing to a single document. There are good reasons for that (distributed transactions are slow/painful, Mongo is optimized for sharding/replication) and you have to decide if you can work within these constraints. Period.

I see the biggest problem in the mainstream of thinking about relational data and using database ORMs. Most of these tools have been built for MySQL when using a relational schema. If you are using MongoDB, you really have to forget about the 3rd normal form. Joining your data in your Mongo queries is the dumbest idea ever -> really slow! (and not just in Mongo btw.)

Ok, so relational data and reactive JOINS didn’t work out so well – but you really want to use Meteor!
How about researching for ways to actually USE the things Meteor/Mongo is really, really good at:

Writing single documents with high speed and realtime-querying denormalized data, that is completely optimized for your application UI.

How you can achieve this? One possible solution is Event Sourcing, and CQRS (logically separate reading/writing in your business logic). You might even learn one or two new software architecture concepts along the way :wink:

4 Likes

IMHO, the best database out there would be ElasticSearch, if it was actually a database - which it is not. We fix that by hot wiring it to CouchDB and its beautiful replication and database changes stream as well as extreme data reliability. That is, all of the searching and all of the data retrieval would be handled by ElasticSearch, while all of the changes would come from CouchDB. With existence of PouchDB all of the replication and the changes stream is already implemented in Javascript and waiting with a bit of tweaking to replace DDP and WebSockets. :wink:

1 Like

Well, we are already using transporter to liveupdate all data in ElasticSearch which can be part of customer autocomplete search or need to be ordered based on match score.

But I still think mongodb Meteor DDP stack is nice for managing users, tracking things real-time using oplog etc…

Also if you apply changes on normalized data and propogate that change to denormalized user-facing collection it can noticably reduce that very well known cost of “per user realtime joins query hell” which many applications face when moving to production. As you dont need them anymore, cause you are doing the work only when data changes, not on every user navigation to given template.
But I cannot argue with data, as we are just building, and there is also that NDA.

I can’t advocate for PostgreSQL, but my database of choice is Microsoft SQL Server and I cannot think of a single time it let me down over all those years. I have old-time customers moving millions of rows of financial data every year and it’s always been rock solid.

As someone else pointed out, the problem isn’t scaling. With SQL Server it’s way better to get a bigger server than trying to move over to a cluster. The guys at StackOverflow have a very good explanation of how they live with that.

A different issue is high-avaliability, but then again, you can setup replication and stay on the safe side as it works pretty well. You’re more likely to incur in problems like the time it takes to move your data between different physical data centers, but that’s an issue you’re going to have with any database where data changes at a fast rate.

TBH I still have issues with MongoDB but that’s because I come from a SQL background and I can’t really get used to denormalizing data. It still feels like a big mess to my eyes :smile:

It usually boils down to choosing the right database for the kind of data you are managing, and most times you’re going to use more than one. As an example, for a client’s project I used MongoDB for most data, together with Neo4J for the graph relationships between users.

My wish would be to have a minimongo-alike feature for all different database systems of course :wink:

1 Like

Interesting! How did you hook up neo4j? AnyDB or through an external service?

TBH I still have issues with MongoDB but that’s because I come from a SQL background and I can’t really get used to denormalizing data. It still feels like a big mess to my eyes :smile:

I think Rethink helps a little with that since you have joins. I started with mongo so I have more trouble breaking my data apart lol

It’s a Node application (the web part uses Express). I’m not using Meteor for that one. I hooked up the neo4j module and that’s it.

I’ll have a look at that, thanks!

Sorry to be harsh, but this post is so opinionated and doesn’t have any data to back up the stance. The above sentence sums up every single problem I’ve ever heard of with MongoDB: everyone who says it has problems has never actually had any problems with it.

4 Likes

No worries, i’d hope you’d speak your mind… :smiley:

To be clear i’m not taking Mongo out of my stack because I fear the possibility of losing data. I’m doing so because I can’t be as competitive of a freelancer with it. That is demonstrable unlike the claims made by several blogs.

The main point of making the post was to see what others are doing who are using secondary databases (as well as vent my frustrations).

If I have to use another DB I would prefer a document DB as I had a great experience with Mongo. This is why I mentioned and am choosing RethinkDB… it’s like Mongo but a bit nicer API and Joins. It could very well have data issues in the future as well but for now I won’t lose customers from it and I don’t have to wrangle maintaining SQL DBs.

Hope this clears it up!

Thanks - yes it does :smile:

You should look into mosql if you need SQL/joins. It may also act as a reliable backup. https://github.com/stripe/mosql

1 Like

@hwillson How do you use and specify write concerns in Meteor? There doesn’t appear to be a whole lot of documentation on specifying write concerns in Meteor other than interacting directly with the MongoDB driver itself.

See @arunoda’s great blog post about using MongoDB replica sets. There is a Write Concern section that’s a little out of date, but gives you an example of how it can be configured via MONGO_URL.

1 Like

mosql is excellent. The best part about it though is the fact that Stripe was behind the original Call me maybe: MongoDO stale reads post that a lot of people point to as fact about mongo data loss. It’s awesome to see Stripe not throw their hands up in the air saying mongo won’t work for them, but instead nail out a solution that will address their issues with mongo, then share everything back with the community. So great!

1 Like

That looks like a great way to also have a reliable backup. I’ll be adding this to my stack immediately!

1 Like

@hwillson thanks for the info – that is a good read. I completely agree!

1 Like

What exactly brings this MoSQL for us? There is no realtime kinda oplog tailing on PgSQL as long as I know.
So just non-reactive joins without app involvement ?

I think you just have technological wanderlust. You’re focused on the how of things and it got boring. So you’re inventing demons to motivate your migration.

Of course, it was quite unfair to everyone else to post this provocative headline as their own clients may see it and freak out unnecessarily. An “I like RethinkDB and here’s why!” would have been more considerate.

That said I do enjoy your take on things and I won’t shame a critic. However I am also awaiting something called evidence. The CAP tradeoff in databases can’t be a revelation worth yelling over at this stage, can it?

3 Likes