OPLOG flooding experience thread

Over the past month, I’ve seen and heard of a number of issues where oplog flooding became a problem in production. I’m sure MDG will provide an excellent solution, but as I gear up to launch my next app its becoming my top concern. I thought it may be helpful if people can share their experience so other’s can learn from it and know when to anticipate it. For example:

  • number of db operations / second
  • type of application
  • server configuration
  • type of silver bullets used
  • etc

A change was introduced in meteor v1.0.4 to fall back to the polling strategy if the oplog stack gets too big — see https://github.com/meteor/meteor/issues/2668. Shouldn’t that solve the problem you are raising in this thread?

It’s a good solution, but not the ultimate one. From what I understand, it’ll rerun the queries instead of trying to keep up with the oplog. Then it’ll start the monitoring oplog again. So I’d guess it would still be a limited solution.

The issue link you posted is a great read on the subject thought

The main reason, while there certainly are others, for hitting an oplog flooding problem would be when your app does rapid database operations, especially with multi: true mutations.

I’ll go ahead and further speculate that you’ll have control over such situations thus provide means to throttle them. Eg, hourly updates on some counts that you store on your users collection where you have 10000 users. You might instead want to update them in batches of say 100 and 30 seconds apart.

Other examples would be cases where you update references to demormalized data, eg a user’s username as it changes on 2000 comments where it was denormalized along with the userId. You’d then rethink why and how you denormalize the data and whether realtime (and perhaps non-reactive) lookups (joins) would perhaps be a better idea.

Therefore, you should go over the design of your application at this point. That coupled with meteor’s latest fix as suggested above would keep you sleeping well over these coming nights until there is an even better fix and support for things like shards.

I had a really bad experience prior to 1.0.4, but now things are better with https://github.com/meteor/meteor/issues/2668. Though it’s still disappointing that any write on the db (even in an unwatched collection) will cause CPU churn because all db updates go into the same oplog. I wrote about our experiences scaling here: https://mixmax.com/blog/scaling-mixmax-monolithic-to-microservices. The eventual solution was just to use a separate database for all data that doesn’t need to be watched by Meteor.

We had issues with this. We actually had to turn OPLOG tailing off to alleviate the problem. We have a collection with 500k documents that get altered almost all the time where each of our users is the owner of hundreds, if not thousands, of documents. Doing batch updates is out of the question. We throttled METEOR_OPLOG_TOO_FAR_BEHIND all the way down to 100 documents which didn’t really help. The first step we are going to take to solve our own problem is to move this large collection out of the meteor mongo and put it in it’s own mongo instance where we will serve the documents through additional servers.

Hi @msavin and all. Can you please share your experience how did you resolve this issue?
We have meteor app in production running on multiple instances. Any update by user on one instance causing all other instances’ CPU high because of oplog (and eventually unuseful for few seconds to minutes).

Use redisOplog

We ultimately sold that app because we didn’t have any reasonable way to deal with that issue (and other issues).

If I were to start over, I would think of ways to reduce stress on the Meteor servers. For example:

  • using Meteor methods to retrieve data
  • subscribing to documents by _id

Both are way faster and more efficient than using reactive queries. I would probably try RedisOplog once it matures. I’m also hoping for Change Streams to be implemented.

Thank you @diaconutheodor and @msavin - README of RedisOplog has example only for single document insert. How do you handle bulk write? For example data upload from a file. I will look into RedisOplog soon.

@praves77 you handle it the same way, there’s no difference. Read the how it works section.