What's up with mongo?

paryguy · June 9, 2015, 1:56pm

Ive been reading through the recent mongo discussions, and wanted to clarify that the main issue seems to be when using multiple servers and replicating data, mongo sometimes drops some records? Is that right or is there a bigger issue here? I have two apps just about to get started that won’t really require a massive mongo base but it seems like mongo has become the pariah of the meteor ecosystem and I’m concerned moving forward whether or not mongo will be supported. Nothing lasts forever but I’d hate to have to tell my client that an update a few months from now will be a decent sized task since the database APIs changed.

At the end of the day, if we’re just using a single instance of mongo are we still at risk of loosing records?

chenroth · June 9, 2015, 3:10pm

I share the same concerns

if you do use the single instance, you’re gonna lose oplog reactivity though

samuelrowan · June 9, 2015, 4:07pm

This is the first I’m hearing of this. I am beginning to write an app that will need to have mongo functioning properly as I will be storing user data for the main functionality of the app. I would love to hear some more discussion about having mongo work across servers without dropping records. Or, maybe some alternative. I wish I had more to add to the discussion but I am quite new and trying to learn.

shock · June 9, 2015, 4:14pm

this myth is kinda silly

paryguy · June 9, 2015, 4:37pm

I think the bigger issue for me is my lack of understanding. I’m new to Mongo, and meteor in general. So to start seeing so many posts about how horrible mongo is, how it can’t scale without loosing data, how we have to just accept that some of our data may be lost…is a little concerning when I’m not a DBA who knows the drivers in and out of all the systems out there.

I should have worded my OP better, I doubt support will be pulled anytime soon but it just seems like maybe there were some bigger issues using Mongo/Meteor than I knew about when I started a few projects that are pretty far along now.

I get that Meteor is still new, but for newcomers, some of these issues raise the concern that there isn’t a stable footing yet. Maybe that’s not right, maybe it’s that some of these concerns being raised give the IMPRESSION that there’s not a stable footing yet for Meteor. That being said, I have seen, read about, and know of a ton of live Meteor apps so I’m just very confused as to why all of a sudden it seems Mongo is under attack.

awatson1978 · June 9, 2015, 4:51pm

Can’t speak for MDG, but I’ll go out on a limb and say that Mongo support is fine. The issue is that Meteor is growing and becoming a victim of it’s own success.

Mongo is a very good database, that is designed for certain types of database scaling problems that don’t require strict consistency in records. Moreover, it has an excellent javascript interface and API, which has allowed Meteor to build a next-generation reactive user-interface.

The issue at hand is that the success of Meteor has attracted people to the platform who want to use the next-generation reactive user-interface, but who have apps they want to build where consistency in their database is necessary. They value consistency over high-availability or horizontal scaling (which is perfectly fine).

paryguy · June 9, 2015, 4:57pm

When you talk about consistency, are you talking about mirrored data across a cluster or records loosing pieces in transit? This is the issue I’m confused on as it seems based on the reading Mongo won’t report a failure if a transaction doesn’t make it to the server because it’ll look like a success on minimongo, do I have that right? I’m not looking for an education in data replication or anything like that but if that’s what it will take to understand any potential issues I’ll gladly read up on it to gain a better understanding.

chenroth · June 9, 2015, 5:30pm

myth?

Meteor works by polling and/or observing the oplog

if you don’t have a replica set, oplog doesn’t exist and Meteor resorts to polling only

reactivity still exists, but not as immediate as you’d expect

shock · June 9, 2015, 5:42pm

I still dont see what possible reason you would have to not enable replica set.

chenroth · June 9, 2015, 5:49pm

me neither. I don’t recall implying that, but I’ll make it clear for post readers:

if you use a single instance (not a single node replica set), you don’t have an oplog to link Meteor to.

awatson1978 · June 9, 2015, 5:50pm

Yeah, your understanding is correct. It usually happens when there’s a server or network device crash and/or the cluster is going through a partitioning event, and is usually more of an issue of losing pieces during transit, which results in inconsistent copies of records. (For some applications, that would be a big problem; for others, not so much.)

chenroth · June 9, 2015, 5:55pm

within an extremely generic statement, would you advise against using MongoDB as a main db for storing unique user-generated content, e.g. a collection of items that belong to users? that is, as opposed to big, statistical data such as counting the number of times some user profile has been viewed

yasinuslu · June 10, 2015, 8:33am

@awatson1978
Does other mongodb instances keep up in time ? Do we need to manually trigger those instances to keep up with the current state ?

Babak · June 10, 2015, 7:12pm

Joining many to many “collections” is a real pain in Mongo. Most Meteor apps I see are better suited for SQL. This includes Telescope, the Slack clones, the Trello clone, even the Meteor examples are best represented by tables because they are representing tabular data.

Databases like Postgres scale just great. Loads of solutions there in that regards. It’s been around for about 20 years. There is very little advantage that Mongo has over Postgres. MiniMongo can be a wrapper for SQL and easily a wrapper for JSON in databases like Postgres. This would enable existing packages that rely on the flexibility of collections to co-exist with other packages that utilize complex relational joins.

Look here for examples of JSON manipulation and querying in PostgreSQL

http://schinckel.net/2014/05/25/querying-json-in-postgres/

It’s even possible to create indexes and do joins and queries crossing JSON and relational data. Exciting stuff!

On a mobile phone so will copy paste examples

Example of an index on a Postgres JSON field (source: http://stackoverflow.com/questions/17807030/how-to-create-index-on-json-field-in-postgres-9-3)

CREATE TABLE publishers(id INT, info JSON);
CREATE INDEX ON publishers((info->>‘name’));

Example of cross JSON and relational join. This example would probably be best al relational and no JSON but it’s just for illustration (source:
http://blog.2ndquadrant.com/postgresql-anti-patterns-unnecessary-jsonhstore-dynamic-columns/)

select
p1.id AS person1,
p2.id AS person2,
p1.data ->> ‘name’ AS “p1 name”,
p2.data ->> ‘name’ AS “p2 name”,
pns1 ->> ‘type’ AS “type”,
pns1 ->> ‘number’ AS "number"
from people p1
inner join people p2
on (p1.id > p2.id)
cross join lateral jsonb_array_elements(p1.data -> ‘phonenumbers’) pns1
inner join lateral jsonb_array_elements(p2.data -> ‘phonenumbers’) pns2
on (pns1 -> ‘type’ = pns2 -> ‘type’ AND pns1 -> ‘number’ = pns2 -> ‘number’);

awatson1978 · June 10, 2015, 7:53pm

@chenroth: It depends on your app. Have you ever tried to search FaceBook for a post you made from last year and not been able to find it? Does that make FaceBook less useful? Inconsistency sounds scary, but is often less impactful than people fear it to be. The world is an inconsistent place. Mongo is perfectly fine for storing many types of unique user-generated content.

@yasinusiu: I’m not 100% sure what you’re asking; but generally speaking, the one thing Mongo does well is keep up in time. An analogy I like to use is a ticket counter versus a subway unlimited pass. The ticket counter is like an ACID compliant transaction; you don’t get the ticket and can’t board the train until the transaction is completed. But a monthly unlimited subway pass can achieve the same goal (of getting money from everybody for each trip); but doesn’t require a transaction each time someone wants to board the train. At the time of boarding, it’s a ‘fire-and-forget’ algorithm. Do some people sometimes get on the train without paying? Maybe. But overall, it’s a system that can keep up with the crowds that doesn’t require banking-level transactions.

@Babak: keep in mind that JSON support to Postgres is rather new… it’s not been around 20 years; and only been around since Postgres 9.2… late 2013 or so (obviously in response to the NoSQL movement). Also, minimongo being a wrapper for SQL? Maybe after an ORM layer has mapped the SQL into a JSON object that minimongo can understand; but introducing an ORM defeats the purpose of having a persistent storage layer that’s isomorphically the same as the client side datastore. So, PostgreSQL may provide SQL access; but it breaks the isomorphic API.

The tough solution would be to create a minisql package. But it borders on being masochistic, because there would need to be an ORM on the server and client. Possibly upwards of three or four ORM layers, in fact, if you consider both inbound and outbound mapping (which is similar to what’s required to maintain HL7 interfaces, btw). If you could implement the ORM isomorphically as well, then it wouldn’t be such a headache (persistencejs may be the solution to this). But how does a normalized database know which objects to trigger an update for to the client? So the isomorphic ORM probably needs to be reactive as well.

Babak · June 10, 2015, 9:38pm

Isomorphic between the client and server I can appreciate. Between the application and database, well, not even Meteor’s MiniMongo is truly “isomorphic” with actual Mongo.

An isomorphic wrapper around Postgres like MiniMongo is not that hard. It’s all single table queries. Heck, single column queries as well. Very basic SQL mapping for single table single column queries.

Then additional method can be extended to pass in raw SQL or join methods. I don’t see the major down side to this. Just a practical path forward.

ron · June 11, 2015, 1:29am

Hi,

I would like @awatson1978 to reply to this, if she would be so kind, and anyone else of-course.

Like the OP I’m new to Meteor/Mongo. I agree with @Babak that for the most part persons are trying to solve problems that are perfectly suited to SQL databases with Mongo/NoSQL. Given that Meteor only works with MongoDB right now, I’m willing to be one of those persons, provided that I can get it to work in a reliable way.

That being said, I’m in the early stages of developing an event ticketing and management app with Meteor. The app will be processing and keeping records of monetary transactions, and important user-account data which must remain consistent.

I don’t mind implementing transaction-safety at the application level (I’ve done it in other scenarios). But, realistically, I won’t become a Mongo expert. And I don’t want to have to become one to implement a system that would store data reliably - as my main reason for choosing Meteor is that I see it as a RAD framework with RTC built in.

Do I have to be terribly worried about implementing this app in Meteor/Mongo? Is it just a bad idea? If not, can you point out the caveats associated with this and whether or not they can be overcome?

Thanks

streemo · June 11, 2015, 1:48am

@awatson1978 i too am interested in your response. I am building an app which also will be dealing with transactions - preferably using Stripe. For each transaction, I am using a system which requires the buyer and seller to check a “got it” box on their client to trigger the continuation of the transaction - data which will be stored in a Sales collection:

{_id: ... , buyerAgree: true, sellerAgree, false, idOfSoldItem: ... } in Sales collection.

On both attributes true, proceed with transaction via Stripe. The Stripe customers must be stored somewhere in Mongo most likely. If both not true within 24 hours, kill transaction.

Further, I will need to store how much money is owed to each user, and then pay users at the end of every two weeks.

Would this be possible to do with high fidelity in Meteor?

Thanks for your response.

awatson1978 · June 11, 2015, 4:01am

If it involves financial data, the conservative and safe approach is to look for something with ACID compliance, such as TokuMX, Postgress, HyperDex, etc. But there are a few things to keep in mind:

First, Mongo’s inconsistency is more of an issue with sharded clusters that have complex topologies. A shard by definition is a replica set, which itself contains three servers of replicated data for fault tolerance. So the question is: how large can you vertically scale a replica-set before you need to start sharding? How large can you grow your business before you need a second geolocated shard/replica set?

To put it into perspective: How many records can fit into 10 or 100TB of disk storage on a single replica-set? If each record is 1kb large (which is itself a pretty darn large JSON object), that’s 10 to 100 BILLION client records. So, your consistency issues become an issue once you reach the 10 to 100 billion record mark. (And I’m sure somebody has squeezed an entire petabyte onto a single replica set using a storage area network; so it may even be in the trillions of records.)

So, until you have 10 billion+ record or are operating out of two datacenters, you don’t need to shard, and you can set your mongo replica set to have Acknowledge, Journaled, or Replica Set Acknowledged write concerns, which are each effectively a ‘transaction’ in the ACID understanding of the term. And that buys you a lot of time to investigate whether it’s a viable business, and to prepare to bring something like TokuMX, Postgresss, or HyperDex into the mix.

Also, don’t forget the role of insurance contracts in managing risk profiles. You may also want to ask yourself what the cost of any lost transaction is. It might be a perfectly viable business model to estimate that each ticket costs $100, to purchase a $100,000K or $1M insurance policy, and to simply offer a free event ticket to anybody that had a session prior to the server crash or system downtime and who contacts customer support about a lost ticket. Same thing goes for the buyer/seller system. There are financial instruments available to handle these kinds of concerns. Consider what role they may have as part of your architecture design and business strategy.

nathan_muir · June 11, 2015, 4:54am

Somewhat correct.

You don’t need to have multiple instances of mongo to have a replica set. You can just run a single mongo instance with the --replSet option (The meteor dev tools do this).

As for polling only - this may not be an issue if your writes are coming from a single Meteor process.

Unless something has changed - Meteor will recognise inserts, deletes and updates going to mongodb, and automatically push them to other clients, without waiting to poll the database.