I'm Done With MongoDB... It's Leaving My Stack

thebarty · September 19, 2015, 12:51pm

Just starting out with meteor I decided to go with as much of meteor’s standard components as possible. So I just need to figure out how to make MongoDB work. So I need some good best-practise tutorials about the best way to model one-to-many, many-to-many in meteor?

That being said it would be really cool to have a django-style db-backend (included in meteor) with database support for the standard-stuff like postgres & mysql.
PLUS a django-style migration system which really really works perfect in django.

I had a quick look at the current sql meteor modules and it looks like it enables us to do sqls in meteor, BUT it is not in any way like the database model of django.

This DB stuff is a really interesting topic so I am looking forward for more infos.

SkinnyGeek1010 · September 19, 2015, 3:20pm

I do like to tinker! but not on client-work. On the weekends I tinker with things like Elixir, Haskell, and Elm, and even Relay & GraphQL. The reason I still use Meteor is because it makes me productive (unlike Relay and GraphQL!).

Of course, it was quite unfair to everyone else to post this provocative headline as their own clients may see it and freak out unnecessarily.

Is the title provocative? Sure. I think most good titles are. However, I’m not trying to induce panic. I tried to be very explicit that the reason for switching was from their ‘perception’ of unreliability… not the assertion that it was unreliable. Mongo is a good general purpose database. Stripe even uses it for important data.

To be clear, I would still use MongoDB if I wasn’t freelancing. However it’s easier for my personal projects to replicate the general setup of client apps so I can benefit from the same tooling (and can test new things on my apps before clients).

An “I like RethinkDB and here’s why!” would have been more considerate.

I tried not to bring Rethink into it but I could have done a better job. I just wanted to add what I was switching to because people would have asked otherwise. I do think Rethink is better but most things are with hindsight. If Meteor had support for both I would choose that. Rethink isn’t the important point as you can use most any database driver with a promise driver (and promise polyfill).

That said I do enjoy your take on things and I won’t shame a critic. However I am also awaiting something called evidence. The CAP tradeoff in databases can’t be a revelation worth yelling over at this stage, can it?

Thanks. I would like to think i’m not yelling, just bringing up talking points for the community to discuss.

Like I mentioned before I never claimed that there was a fact of data lost. The blog article did (which made my clients freak out). The main point is that MDG still hasn’t given us the choice to choose other DBs and it’s costing me a great deal of friction freelancing.

I was also very interested in how others where using secondary databases in the Meteor stack.

At any rate I just wish MDG would support more than one DB and that it wouldn't take 4 years to get a 2nd DB. (not sure why it's still so coupled to minimongo :frowning:)

muaddib · September 19, 2015, 3:53pm

Sorry but you lost me on Microsoft SQL.

How can you host a reliable platform on Windows?

How can you give up the speed of the linux kernel?

For example blk-mq (multi queue block) is a simple patch of the linux kernel. It is based on the idea of a single mathematician on how to efficiently solve the lock of shared resources problem, which again poured down from the research on parallel computing.

This single patch can speed up your random write and reads (a typical database payload) up to 10 times. But I’ve seen even faster improvement in real case scenarios (large arrays of PCIe SSDs, multi socket servers, …). SourceOriginal 2013 Presentation, now benchmarks are even better

So you go from needing 10 windows servers, to be able to use 2 or 3 linux machine. Without rewriting a single line of code. How can you give that up? How is it even thinkable?

And this is just one example, I have 10 more in mind.

I understand I’m very opinionated, and that I have a very academic point of view, as I spent most of my time with computers using pen and paper rather than a keyboard. I dealt mostly with abstract problems, with no direct applications, so sometimes I’m not so connected with the harsh reality of production, but my humble opinion is that most of the people who complains about the speed of NoSQL databases just need to write better queries. I apologize if this might offend anyone, but it’s worth to think about it.

Let’s take a step backward: Everything is a turing machine. There’s no magic behind it. There’s no special IO code that makes writes faster (except blk-mq ). If it was there, the mongodb devs would copy it, or viceversa it would be ported to MySQL.

Moreover a noSQL database can emulate (with a performance penality) a SQL database. That’s a mathematical theorem. So the MongoDB devs can make it as reliable as any other database, if that becomes the priority.

So there’s no qualitative difference between MongoDB and any other DB. It’s just how well we model data, and how much the database helps us to write correct queries.

Given that, a modern linux kernel, NVME PCIe SSDs and a terabyte of RAM gives you so much performance that you can keep writing poor queries, and forget about mongoDB vs anything elese. Computational power is soooo cheap nowdays, a single 25k$ machine outperform the 500k$ cluster my university bought in 2001.

tanis · September 19, 2015, 7:05pm

I do. And companies bigger than mine have been doing the same for years. Here’s a very informative talk on the subject by Brent Ozar, although being a bit dated: http://www.brentozar.com/archive/2011/11/how-stackoverflow-scales-sql-server-video/

I’ve been using Linux for a very long time and I still use it just as much as I use Windows. It depends on what I’m doing. The usual approach of the best [tool/OS/app/whatever…] for the job you have to do is a valid stance.

What I can tell you from my professional experience working with production environments is that I’ve never lost data with SQL Server and Oracle, while I did lose data with PostgreSQL (that was around 2001 though, so a lot has changed in the meanwhile) and MySQL (before InnoDB was introduced).
I lost data in MongoDB as well, and that happened less than a year ago, so that’s worrying.

My concern is about data reliability. I want to be sure that I do not lose a single bit of data. Speed is indeed something we all need, but there are other ways to speed things up. As you say, hardware is cheap nowadays. And as Brent Ozar always points out, the best query is the one you do not need to run (caching comes to mind… and a good KV storage like Redis does miracles).
It’s all about finding the solution that fits your needs, but I can’t rely on MongoDB. I do have production sites using Meteor and MongoDB and the only thing that makes me sleep well at night is that I have backups running very often, but that’s not enough for clients that can’t afford to lose a single transactions.

Why would I ever want to make something harder by emulating SQL with a NoSQL database? I’d rather opt to use the SQL backend if that’s what I need. There are way too many things to worry when you’re using a NoSQL database as a SQL one. Just think about transactions, you would have to do them by code. And what happens when your code breaks for whatever reason? A SQL database would do a rollback and you’re done. If you manage everything by code, you would have incomplete data in your database, which is really more of a headache.

Ensikyo · September 19, 2015, 7:29pm

Hello,

Some people think dynamically typed schema is a problem in MongoDB and alterate its reliability (for example, some links in this topic and their sources), if you really want to have schema, I think you should look at this package https://github.com/aldeed/meteor-collection2
It allows you to add schemas to your collections, to check if your data match the schema, and other cool things.

Good bye.

waldgeist · September 19, 2015, 8:30pm

This Event Sourcing thing sounds promising. Unfortunately, the documentation of the linked package is quite scarce. It explains a lot about the basic concepts, but I could not find any docs on how to actually use the package / how the API works. Is this an alpha package or already meant to be used in production?

waldgeist · September 19, 2015, 9:07pm

I don’t know any Meteor-specific tutorials, but there’s a chapter in MongoDB’s own documentation about this:

http://docs.mongodb.org/master/core/data-modeling-introduction/
http://docs.mongodb.org/master/applications/data-models-relationships/

ihistand · September 19, 2015, 10:42pm

Or a third client type: one who is using another database, and they have no interest in changing.

There have been many suggestions of other databases that would work great with Meteor, but only two of those even shows up in the top 10 databases used by real companies: MongoDB and PostgreSQL. Maybe I’m just old-school, but I’ve worked for half a dozen Fortune 500 companies in my 20 year IT career, and selling management on implementing something new would be next to impossible. Here’s the ranking: DB-Engines Ranking

So for me, which database to use is a pointless question. The only question that matters is “does this platform support the database engine that my client uses?”

Sure, it would be possible to build a stand-alone system that uses another system and then build an ETL process that syncs data up as needed, but again, that would be a hard sell.

The missing piece for Meteor is a universal DB driver that works with pretty much anything, at least anything that has a good well supported Node driver out there. Tying it to a specific DB was a huge mistake, IMO. MGD realized recently that adding other front end frameworks like React and Angular would be a good idea, and I like that. They need to do the same for a universal DB driver. The ccorcos:any-db package looks interesting, I’ll have to give it a try.

–Ivan

CodeAdventure · September 21, 2015, 1:38pm

Hey @waldgeist!
Unfortunately we haven’t found the time to write up a complete documentation yet. Currently the best way to get a feeling how to use the package is to take a look at the integration test which covers the basics:

But there is something big coming soon: We are building a clone of this excellent CQRS tutorial that will explain all important concepts and cover the important packages of the meteor-space framework (it’s not just event-sourcing ;-)) Our plan is to use the Meteor Global Hackathon to flesh out a nice UI too, so it’s not just boring business concepts.

Of course we are already using this package in production and are constantly refining the patterns / adding important features! If you want to join the discussion or ask any kind of questions, join our Gitter chat. We are a currently building a small but very friendly community around the Space framework

korus90 · September 21, 2015, 2:52pm

In what way was this at all useful? No-one here is talking about any / many schemas, they’re talking about data reliability and the perception of MongoDB. That package (which is very good, I might add) does nothing to support either argument.

It’s like you jumped in on a discussion about what everyone’s favourite sandwich is by saying you love McDonald’s…

ashooner · September 21, 2015, 4:35pm

As a casual observer, thread seems a bit reactionary. Rather than addressing the problem of broader effect on the marketability of Meteor due to it’s coupling with Mongo, everyone seems to just be arguing tenets of Mongo itself.

Shouldn’t Meteor transcend the preference for Mongo? Last time I checked out Meteor, there was supposedly active development in removing the need for Mongo. I guess the tone just has kind of an echo-chamber feel to someone interested in Meteor but wary of Mongo. Is Mongo really a sword worth falling on for Meteor?

ccorcos · September 21, 2015, 5:34pm

@SkinnyGeek1010, I’m almost done with a new version of AnyDb. I’m using it with Neo4j right now and its working great, but you could just as easily use it with any other database. I’ll release it in the next couple weeks.

I really wish accounts-base was written in a more generic way so I wouldnt be stuck using Mongo for accounts, but we’ll tackle that later.

SkinnyGeek1010 · September 21, 2015, 6:05pm

Yea that’s kind of the problem, it’s been years since the first hint of a secondary database.

ccorcos · September 22, 2015, 6:27am

I think a big reason is that its tough to get out-of-the-box reactivity from a relational database. Because mongo is non-relational, computing observeChanges based on the operation log is a challenging but tractable problem. On the other hand, reactivity in a relational database isnt so easy. Here’s an example I’ve been dealing with: Suppose a user’s feed consists of posts that people they follow have starred. In Neo4j, the query looks like this:

MATCH (:USER {name: 'Chet Corcos'})-[:FOLLOWS]->(:USER)-[:STARS]->(p:POST) 
RETURN p

Now suppose someone stars an event. Its not so trivial to know if this user’s feed needs to update. You actually have to run a query to determine if ‘Chet Corcos’ is following that user.

So if Meteor is going to support other databases, its going to require that developers put a little more legwork into computing how changes effect queries…

SkinnyGeek1010 · September 22, 2015, 12:51pm

I do wish they would have implemented other non-relational dbs by now (like CouchDB, Redis, Rethink, Cassandra) with poll + diff (even if they had to hire an additional team that did nothing but this).

On the other hand, reactivity in a relational database isnt so easy. Here’s an example I’ve been dealing with: Suppose a user’s feed consists of posts that people they follow have starred. In Neo4j, the query looks like this

Thanks for providing that example! It’s really interesting to see the challenges that lie ahead.
I’m loving AnyDB, keep up the good work!

awatson1978 · September 22, 2015, 8:29pm

They’ve already implemented Redis and MiniRedis, and it hasn’t exactly displaced mongo as the preferred data store.

ccorcos · September 23, 2015, 12:49am

Poll-and-diff is actually really easy. Just use the new diff-sequence package!

Redis is non-relational so for my needs, this isn’t really an improvement for me.

SkinnyGeek1010 · September 23, 2015, 12:59am

Ah I didn’t know it was prod. ready yet (I thought it was just a hackday thing).

robfallows · September 23, 2015, 8:39am

It’s definitely not prod ready - I’ve had a bunch of issues with it. It sorta works, but is just flaky enough to dissuade me from using it.

shock · September 23, 2015, 9:10am

I would also vote to integrate some vegan non-GMO database to make clients happier