Mongodb strategy for Multi-Company web app

theara · July 2, 2019, 12:07pm

I tried to build POS System that will be running on cloud. Each user must belong to a Company. Each Company can only access it’s own data. Each user can access it’s own data and some data shared with other users of the same company. Imagine 1.000 companies and 100 users per company , it could get very bad in performance and secutiry, if I use 1 Mongodb database for whole app.

I think I can define 1.000 dbs, lets say db_0001 , db_0002 , … with same name collections, lets say tasks , messages , …, so the app can be efficient and more secure.
(but don’t know how to do this).

Please advise and share!

paulishca · July 2, 2019, 12:40pm

Hi @theara,
this is not a response but more a reply, an invitation to discussion.
I would assume the POS = Restaurant Point of Sell.
Are you going to:

store CC data?
cross the border in multiple countries?
collect personal data like: reservation details or loyalty scheme?
have European employees
collect data of European individuals like the case for point 3.

Depending where you operate, you might need to get certified as a financial system, get compliant with PCI DSS and/or GDPR abide to local government rules and regulations.
These all may impact your infrastructure design.
From the marketing perspective, you might need to be able to sell your system as cloud product but self hosted by a restaurant chain for instance. So you need to be able to deploy quick and resource efficient.
You also need to consider backup for DB and some decent SLA terms as well as T&C to offer on your contracts.
I would personally go into such a system on full compliancy state for the above, one DB with geographical diversity (if you want to operate internationally) with 1-2 hours backup sets, and AWS infrastructure and pair the EC2 instances (Meteor) with a DB shard of the 1 Mongo DB. Mongo Atlas offers that level of compliancy. I would also be ready to offer this as a self hosted cloud system which I would also use in countries where it is illegal to store financial and personal data outside the borders.

theara · July 2, 2019, 1:17pm

thanks @paulishca for your reply.
But I seem don’t understand sorry.

perumalkuk · July 2, 2019, 1:40pm

I would go with single DB - choose a way to logically partition companies. Of Course that means through testing to make sure there is no data leak. But advantage is less complex to maintain, manage and support.

znewsham · July 2, 2019, 1:52pm

This is an extremely complicated topic, what’s more important than either the number of companies (which will likely grow predicatably), or the number of users (which will likely stay fixed per company), is the number of other entities being stored, and how evenly distributed they are across the companies - this piece is important for scaling/sharding mongo.

Meteor doesn’t play nice with multiple databases - particularly at the user level (i.e., you simply cannot do it at the user level, and still use the accounts packages). You can use multiple DB’s everywhere else, though you’d find the overhead high I think (consider maintaining 1000 DB connections per server because you have 1000 companies). It also makes any type of analytics at your “superuser” level hard - e.g., “how many users do I have” would require querying across many DBs.

The approach we’ve taken is to put everyones data into one DB - every entity has a teamId field. Users have a teamIds field (as they can belong to multiple teams). Then all our indexes include the teamId field. Every method call and publication takes a teamId argument and we check that the logged in user has access to that team, and limit the dataset to that teams data. This means we have one database (+ a separate one for logging) and a fixed number of collections.

Once your data is setup like this, you can shard your database, using teamId as part of the shard key (to ensure all of one teams data exists on the same set of shards). However, at this point you can’t use mongo’s oplog to observe the DB (as oplog doesn’t work with sharded clusters) - so you’ll need redis oplog. Maintaining a sharded database is hard - but migrating from an unsharded database to a sharded one is extremely hard - so I’d advise eating the cost (you need a minimum of 6 servers instead of 3). Also, think hard about your shard key - companyId “works” but as you scale the number of companies, the number of shard sets will also drastically increase. So it may be worth assigning a specific shard (e.g., “us-west” to companies on the west coast). However you’d need to track this + your companyId on every entity.

@paulishca’s point about compliance is also a really big concern - it’s difficult for small non-eu based companies to comply with all the requirements of GDPR - but thats no reason not to comply with the ones where it is possible. Using a sharded DB allows you to store user data in “zones” (e.g., an EU zone).

A good option in terms of “ease” is Mongo Atlas - but it is also exceptionally expensive. Technically it’s only about double the price of AWS direct, but in reality it’s much more, because you have to pay for way higher specs than you need to have the sharding option.

It’s worth noting that my assumption here is you’re going to want to run this as SaaS as opposed to running a single app per company. If your goal is the latter, that would change my advice somewhat - but would also come with its own challenges.

theara · July 2, 2019, 11:04pm

Thanks for all, Totally we should be go with single database.
But I don’t understand about scaling/sharding mongo.
Could we do scaling/sharding by themself in meteor?
Now I deployed my app on DigitalOcean with MUP.

wildhart · July 3, 2019, 8:17am

Have you looked at mizzao:partitioner? I’ve been using it for a couple of years to successfully and seamlessly partition my collections in a multi-tenancy SAAS app.

theara · July 3, 2019, 8:51am

@wildhart, thanks I will check

ajaay · February 21, 2020, 2:11pm

We’re currently a single app/db providing segmentation between customers in the same way you were with the teamId. We’ve just landed our first Enterprise customer that requires a dedicated database instance so I’m not looking at ways to provide it.

The option I’m able to make most sense of is the sharding technique utilising the teamId as the key. However having a look through the Atlas docs it mentions a max of 12 shards and therefore limited to 12 customers only.

Is this something you’ve gone ahead and implemented? If so can you offer any advice or perhaps other ways of achieving this?

Cheers!

znewsham · February 21, 2020, 4:58pm

We self host our mongodb - so we’re not limited in the same way. That being said, Mongo recommends not using things like teamId as a shard key as it can lead to mismatched chunk sizes and performance problems.

I’d also suggest that sharding in general won’t address your customer’s concerns here, as the database credentials that allow access to the DB will be the same and your application would need to access the data through a mongos router. This means that any application layer fault that allows access to other customer’s data will continue to do so. So it won’t really be a dedicated database.

I’d suggest you spin up a truly dedicated meteor server with a different mongo URL pointing at an entirely different db. This could reside on the same physical server as your main DB, but should use different access credentials.

evolross · March 29, 2020, 4:41am

How is sharding different than cross-region replication? Does sharding actually have a writable primary in each region? Thus making write performance better in each region?

As the above article states that

Note that all writes will still go to the primary in our preferred region, and reads from the secondaries in the regions we’ve added will be eventually consistent.

znewsham · March 29, 2020, 5:09am

Yes, sharding has a primary in each region (each shard is an entire replica set). Reads/writes go through a mongos router which analyses the query and directs it to the correct replicaset (or multiple replicasets in some cases).

Cross region replication is for redundency and faster read access (at the cost of slightly stale data). It doesn’t help with write throughput