What's up with mongo?

Babak · June 10, 2015, 7:12pm

Joining many to many “collections” is a real pain in Mongo. Most Meteor apps I see are better suited for SQL. This includes Telescope, the Slack clones, the Trello clone, even the Meteor examples are best represented by tables because they are representing tabular data.

Databases like Postgres scale just great. Loads of solutions there in that regards. It’s been around for about 20 years. There is very little advantage that Mongo has over Postgres. MiniMongo can be a wrapper for SQL and easily a wrapper for JSON in databases like Postgres. This would enable existing packages that rely on the flexibility of collections to co-exist with other packages that utilize complex relational joins.

Look here for examples of JSON manipulation and querying in PostgreSQL

http://schinckel.net/2014/05/25/querying-json-in-postgres/

It’s even possible to create indexes and do joins and queries crossing JSON and relational data. Exciting stuff!

On a mobile phone so will copy paste examples

Example of an index on a Postgres JSON field (source: http://stackoverflow.com/questions/17807030/how-to-create-index-on-json-field-in-postgres-9-3)

CREATE TABLE publishers(id INT, info JSON);
CREATE INDEX ON publishers((info->>‘name’));

Example of cross JSON and relational join. This example would probably be best al relational and no JSON but it’s just for illustration (source:
http://blog.2ndquadrant.com/postgresql-anti-patterns-unnecessary-jsonhstore-dynamic-columns/)

select
p1.id AS person1,
p2.id AS person2,
p1.data ->> ‘name’ AS “p1 name”,
p2.data ->> ‘name’ AS “p2 name”,
pns1 ->> ‘type’ AS “type”,
pns1 ->> ‘number’ AS "number"
from people p1
inner join people p2
on (p1.id > p2.id)
cross join lateral jsonb_array_elements(p1.data -> ‘phonenumbers’) pns1
inner join lateral jsonb_array_elements(p2.data -> ‘phonenumbers’) pns2
on (pns1 -> ‘type’ = pns2 -> ‘type’ AND pns1 -> ‘number’ = pns2 -> ‘number’);

awatson1978 · June 10, 2015, 7:53pm

@chenroth: It depends on your app. Have you ever tried to search FaceBook for a post you made from last year and not been able to find it? Does that make FaceBook less useful? Inconsistency sounds scary, but is often less impactful than people fear it to be. The world is an inconsistent place. Mongo is perfectly fine for storing many types of unique user-generated content.

@yasinusiu: I’m not 100% sure what you’re asking; but generally speaking, the one thing Mongo does well is keep up in time. An analogy I like to use is a ticket counter versus a subway unlimited pass. The ticket counter is like an ACID compliant transaction; you don’t get the ticket and can’t board the train until the transaction is completed. But a monthly unlimited subway pass can achieve the same goal (of getting money from everybody for each trip); but doesn’t require a transaction each time someone wants to board the train. At the time of boarding, it’s a ‘fire-and-forget’ algorithm. Do some people sometimes get on the train without paying? Maybe. But overall, it’s a system that can keep up with the crowds that doesn’t require banking-level transactions.

@Babak: keep in mind that JSON support to Postgres is rather new… it’s not been around 20 years; and only been around since Postgres 9.2… late 2013 or so (obviously in response to the NoSQL movement). Also, minimongo being a wrapper for SQL? Maybe after an ORM layer has mapped the SQL into a JSON object that minimongo can understand; but introducing an ORM defeats the purpose of having a persistent storage layer that’s isomorphically the same as the client side datastore. So, PostgreSQL may provide SQL access; but it breaks the isomorphic API.

The tough solution would be to create a minisql package. But it borders on being masochistic, because there would need to be an ORM on the server and client. Possibly upwards of three or four ORM layers, in fact, if you consider both inbound and outbound mapping (which is similar to what’s required to maintain HL7 interfaces, btw). If you could implement the ORM isomorphically as well, then it wouldn’t be such a headache (persistencejs may be the solution to this). But how does a normalized database know which objects to trigger an update for to the client? So the isomorphic ORM probably needs to be reactive as well.

Babak · June 10, 2015, 9:38pm

Isomorphic between the client and server I can appreciate. Between the application and database, well, not even Meteor’s MiniMongo is truly “isomorphic” with actual Mongo.

An isomorphic wrapper around Postgres like MiniMongo is not that hard. It’s all single table queries. Heck, single column queries as well. Very basic SQL mapping for single table single column queries.

Then additional method can be extended to pass in raw SQL or join methods. I don’t see the major down side to this. Just a practical path forward.

ron · June 11, 2015, 1:29am

Hi,

I would like @awatson1978 to reply to this, if she would be so kind, and anyone else of-course.

Like the OP I’m new to Meteor/Mongo. I agree with @Babak that for the most part persons are trying to solve problems that are perfectly suited to SQL databases with Mongo/NoSQL. Given that Meteor only works with MongoDB right now, I’m willing to be one of those persons, provided that I can get it to work in a reliable way.

That being said, I’m in the early stages of developing an event ticketing and management app with Meteor. The app will be processing and keeping records of monetary transactions, and important user-account data which must remain consistent.

I don’t mind implementing transaction-safety at the application level (I’ve done it in other scenarios). But, realistically, I won’t become a Mongo expert. And I don’t want to have to become one to implement a system that would store data reliably - as my main reason for choosing Meteor is that I see it as a RAD framework with RTC built in.

Do I have to be terribly worried about implementing this app in Meteor/Mongo? Is it just a bad idea? If not, can you point out the caveats associated with this and whether or not they can be overcome?

Thanks

streemo · June 11, 2015, 1:48am

@awatson1978 i too am interested in your response. I am building an app which also will be dealing with transactions - preferably using Stripe. For each transaction, I am using a system which requires the buyer and seller to check a “got it” box on their client to trigger the continuation of the transaction - data which will be stored in a Sales collection:

{_id: ... , buyerAgree: true, sellerAgree, false, idOfSoldItem: ... } in Sales collection.

On both attributes true, proceed with transaction via Stripe. The Stripe customers must be stored somewhere in Mongo most likely. If both not true within 24 hours, kill transaction.

Further, I will need to store how much money is owed to each user, and then pay users at the end of every two weeks.

Would this be possible to do with high fidelity in Meteor?

Thanks for your response.

awatson1978 · June 11, 2015, 4:01am

If it involves financial data, the conservative and safe approach is to look for something with ACID compliance, such as TokuMX, Postgress, HyperDex, etc. But there are a few things to keep in mind:

First, Mongo’s inconsistency is more of an issue with sharded clusters that have complex topologies. A shard by definition is a replica set, which itself contains three servers of replicated data for fault tolerance. So the question is: how large can you vertically scale a replica-set before you need to start sharding? How large can you grow your business before you need a second geolocated shard/replica set?

To put it into perspective: How many records can fit into 10 or 100TB of disk storage on a single replica-set? If each record is 1kb large (which is itself a pretty darn large JSON object), that’s 10 to 100 BILLION client records. So, your consistency issues become an issue once you reach the 10 to 100 billion record mark. (And I’m sure somebody has squeezed an entire petabyte onto a single replica set using a storage area network; so it may even be in the trillions of records.)

So, until you have 10 billion+ record or are operating out of two datacenters, you don’t need to shard, and you can set your mongo replica set to have Acknowledge, Journaled, or Replica Set Acknowledged write concerns, which are each effectively a ‘transaction’ in the ACID understanding of the term. And that buys you a lot of time to investigate whether it’s a viable business, and to prepare to bring something like TokuMX, Postgresss, or HyperDex into the mix.

Also, don’t forget the role of insurance contracts in managing risk profiles. You may also want to ask yourself what the cost of any lost transaction is. It might be a perfectly viable business model to estimate that each ticket costs $100, to purchase a $100,000K or $1M insurance policy, and to simply offer a free event ticket to anybody that had a session prior to the server crash or system downtime and who contacts customer support about a lost ticket. Same thing goes for the buyer/seller system. There are financial instruments available to handle these kinds of concerns. Consider what role they may have as part of your architecture design and business strategy.

nathan_muir · June 11, 2015, 4:54am

Somewhat correct.

You don’t need to have multiple instances of mongo to have a replica set. You can just run a single mongo instance with the --replSet option (The meteor dev tools do this).

As for polling only - this may not be an issue if your writes are coming from a single Meteor process.

Unless something has changed - Meteor will recognise inserts, deletes and updates going to mongodb, and automatically push them to other clients, without waiting to poll the database.

nathan_muir · June 11, 2015, 5:07am

Good write up @awatson1978.

I’d like to add - to avoid sharding mongoDB, you have the option to shard meteor at the application level.

Simply back each Meteor cluster with a separate mongo instance.

You’ll need a (somewhat) centralised Auth database, then just use DDP.connect to connect to the right server for that Company/User. (eg, that contains all their tickets/projects/whatever)

chenroth · June 11, 2015, 6:37am

If you read carefully, I said you need a replica set for an oplog - I didn’t specify any requirement about the number of instances. It’s well known, and for any new reader should be known, that Meteor by default runs its bundled Mongo as a single instance replica set.

as for polling, processes may arbitrarily “skip” the delay, but it’s not a guarantee that the average (don’t mean to sound condescending) developer would expect when they read that “Meteor is reactive!”

nathan_muir · June 11, 2015, 7:38am

You mean here:

There’s no arbitrary part here - If you are using a single meteor instance - all db interaction done via that instance is 100% reactive, even using the polling driver.

Only when writes go directly to mongo from other applications (or other meteor instances) there may be up to 10s delay with the polling driver.

chenroth · June 11, 2015, 8:07am

I’ve further clarified my statement in a following post about the single instance terminology

as for polling, it is not exceptional for an app to grow beyond a single app instance and it’s just something that needs to be very clear to any Meteor developer from the start, since it’s an architectural factor to bear in mind

I think our discussion has concluded, as we’re both on the same page
The important outcome is I hope fellow post readers can understand better the limitations of Meteor and Mongo

paryguy · June 11, 2015, 11:25am

Thanks for all your time with this topic. I figured it’d take a lot before this became an issue but just wanted to make sure.

ron · June 12, 2015, 2:26am

@awatson1978 Thanks for your comprehensive reply.

I realize that the concern about data-consistency is really a problem if my app is successful - in other words - a problem I would like to have. At the same time, I wouldn’t want to have to change the whole stack at that point.

I’m thinking about the possibility of ‘backing-up’ the crucial data to PostgreSQL, then syncing the MongoDB server with it in the event of a system crash. This would involve dual writes to both Mongo and Postgres.

I’m also intrigued to hear about TokuMX. Their enterprise pricing seems quite affordable, provided that their storage engine lives up to its claims (there’s also the free version). Have you any experience with it?

In the meantime, I’ll continue building my app on Meteor, and cross bridges when I come to them.

streemo · June 14, 2015, 2:34am

I agree with this, these seem like very important things to think about - but if I can manage a consistent storage system with mongo at low scale, I think the best option is to go with something that works now - see if the app is even successful, and then tackle problems that arise at 10^9 documents later.

@awatson1978 what a wonderful write up - so much knowledge in here that I did not know. I didn’t even consider insuring faults by the server - this might actually be the road I take. This is a great thread, thanks @paryguy for starting this - I think some very good discussion came of it.

awatson1978 · June 14, 2015, 7:14am

Glad I could help! Best of luck with your projects!

franky · June 18, 2015, 3:57am

Will Mongo DB’s purchase of WiredTiger’s tech help with any of the issues people are having with Mongo?

vectorselector · June 18, 2015, 3:33pm

I humbly suggest folks spend a little time looking at Meteor code, Mongo code, Javascript, JSON, and seeing why certain components were chosen. Trendy concerns of late on hackernews are as useless on the criticism side as they are on the bandwagon side (FUD)… Mongo seems like everyone’s favorite horse for beating as of late, and that Postgres manual sure is thick, I printed their docs once, about 15 years back, and that used a lot of trees.
SQL (you might as well mention Prolog) has nothing to do with anything I mentioned in my first sentence, but it’s a good thing in it’s own right, but I question heavily layered software that doesn’t firmly stack-up one-turtle-upon-the-next, as an ORM between SQL and Javascript objects would need to be… It’s not saying that it’s not hackable, it’s just not as elegant. Let’s put it this way: Mongo DB perfectly mates with Javascript on the server, and if I were interested in SQL, I would perhaps base my entire server side platform on the premise of a SQL connector, from the beginning.
Meteor provides a rather thin wrapper to Mongo calls, and actually, Mongo is pretty darn simple to get up to basic speed with!
Start up a Meteor app, and then run “meteor mongo” in another terminal tab and have some fun poking around “show collections” and then “db.whatever.find()” or getting into Mongo selectors and projections… super simple stuff actually…

entropy · June 18, 2015, 3:55pm

Yep, as much fun as it is to jump on the Mongo is bad band wagon. They are aggressively expanding and things will only get better, for example the guy that founded Wired Tiger wrote the paper on snapshot serialization that PostgreSQL uses. I mean MySQL was basically in the same boat when it came out and people insulted it for its poor consistency guarantees, that is until InnoDB came along. Now there is a lot of stuff not related to the storage engine that needs to be fixed but I can guarantee you that the people at Mongo know about peoples concerns and working to fix them.

paryguy · June 18, 2015, 4:18pm

Thanks for the insight everyone. That’s why I originally posted the question, it just seemed like out of nowhere a bunch of MONGO IS BAD posts and articles started to show up. But I guess that comes with a growing developer community.

trusktr · August 1, 2017, 11:49pm

Sorry for necrobumping, but some of you might be interested to know that Mongo did have a bug that didn’t guarantee what the documentation promised but it is fixed as of Mongo 3.4, and Mongo 3.6 is looking very nice!

Meteor 1.5.1 is still on Mongo 3.2. Let’s hope an update comes soon!