Mongo Scaling Issues

elie · August 11, 2016, 8:10pm

Hi,

Tonight we’re having peak traffic on our site. We’re running 40 DO droplets, but the issue seems to be the Mongo database is taking far too long to respond according to Kadira and the fact that it takes along time to do things even on a server that has no users connected to it.

What are common strategies to deal with such issues?

davethe0nly · August 11, 2016, 8:34pm

Are you hosting your self on those droplets or via DBaaS?

elie · August 11, 2016, 8:34pm

We’re hosting on Compose.io

davethe0nly · August 11, 2016, 8:43pm

Do you have indexes set-up?
Inspect ops in kadira, does it says, its using oplog tailing?

elie · August 11, 2016, 8:44pm

Yeah. Indexes are setup. Is there an easy way to find important indexes in case we’re missing any?

ramez · August 11, 2016, 8:56pm

Not sure about mongo, but often you can run a query on the db to see ‘missed indexes’. Also, did you follow instructions in this post: http://info.meteor.com/blog/tuning-meteor-mongo-livedata-for-scalability

Finally, MongoDB is really not a reactive db. Oplog trailing is a design mistake as it’s not scalable. If you are sharing your DB with all your DO instances, each additional instance only offers incremental improvement, as it has to watch the activity of ALL users to detect its own.

We are migrating Meteor to RethinkDB which has built in reactivity.

int64 · August 11, 2016, 9:51pm

799 active connections is a problem with mongodb because it creates a thread per socket. In case all these connections are active which they typically are since the drivers are sending ping and ismaster commands so regularly. The amount of context switches is just insane.

Also does compose actually provides its users the Machine specs, that is how much cores are you actually running on? This is very important because if your instance is only pinned to one core worst case VCPU and you have 799 active threads, this will not go very well for you.

DM me if you still continue to face issues.

elie · August 11, 2016, 9:53pm

I have no idea why we have 799 active connections with only 40 instances running. Sent you a DM. Thanks

int64 · August 11, 2016, 9:54pm

This is because the node.js driver typically has a connection pool limit of 100 for node.js.

I am not sure if meteor allows you to set this, but you can set the pool size for the driver to no more than 5.

elie · August 11, 2016, 9:56pm

Any idea how I’d go about doing that?

elie · August 11, 2016, 9:57pm

So taking down instances may help too right?

int64 · August 11, 2016, 9:58pm

Take down about 20 instances and watch how many connections are dropped. I myself do not develop with meteor so i am not sure if the framework provides a way for you to pass the mongo options down to the driver. It should else it wont make sense.

elie · August 11, 2016, 10:00pm

I may run into problems of not having enough instances running if I take too many down, but I will try a few

int64 · August 11, 2016, 10:02pm

How many RPS are you seeing across your cluster at the moment?

elie · August 11, 2016, 10:03pm

I don’t understand the question

int64 · August 11, 2016, 10:04pm

Request per second at the load balancer. I will think you have some sort of proxy routing request to the 40 instances and curious what the RPS is at the moment. Typically an inactive meteor instance opens about 14 connections to the mongodb. For your case that is 560 connections already and this is for the case where the meteor instance has zero traffic so the 799 adds up. Unfortunately it is too much.

elie · August 11, 2016, 10:14pm

Using Nginx as the load balancer, but I’m not sure how I find RPS. We had about 1000 connected users at the time, and now it’s far less, but the issues have persisted.

There is one essential worker instance that is doing a lot of writes to the database.

Things have settled a little right now, but it’s been two hours of hundreds of complaints and things are still slow.

elie · August 11, 2016, 10:15pm

What sort of number is acceptable?

elie · August 12, 2016, 9:03am

Thanks for the help. Pretty certain the issue was as you said and I had far too many connections to the database. I’ve reduced that number now and also increased the RAM for the deployment.

I’ll need to look into connection pooling too.

vyky · August 12, 2016, 10:06am

Try mongodb://<username:password>@<ip>:27017/<dbname>?maxPoolSize=200

The default is 100. You can increase to whatever number you want