CPU spikes due to Oplog updates without subscriptions

Hello everybody,

Long time no see! Hope everyone is doing fine! :slight_smile:

We made lots of changes to our codebase in recent months, most of them in regards to performance. We removed many subscriptions, (finally!) implemented Oplog tailing, moved from DigitalOcean to AWS and much more.

We still have a setup with (at the moment) 4 identical meteor app docker containers (behind a traefik load balancer) and 2 identical plain nodejs docker containers running our workers (on a separate AWS server!) doing the heavy lifting.
Our database is not hosted on MongoDB Atlas on M30 instances with the default 3 replica nodes.

We have this use case, where we have to import lots of data into our database. This is done on the workers and most of these updates touch 2 or 3 mongo collections. We are talking spikes of ~5k updates on both collections in a matter of seconds. These loads sometimes are “sustained” (with breaks in between of a couple of seconds) for minutes or sometimes even hours!

During these imports, we see significant CPU spikes on all 4 meteor instances.
But, here is the catch: We do NOT have any subscriptions on these collections anymore!

My questions now are:

  1. Is this expected behaviour?
  2. Is there anything we can do about this to reduce the load on the meteor instances during these updates?
  3. Am I missing something here?

Please see the screenshot attached!

Thank you all, best

1 Like

Hi Patrick,

you say your DB is NOT on Atlast but you mention specs. I thing that is a typo? If yes, can you see in Atlas what was the number of connections at the time of these updates. I am thinking Meteor could have been left without available connections and that could be a cause for processor spikes that you see.
Also, from where you write those large numbers of records, you may check the write concerns and reduce friction between servers in the replica set if that is not particularly important for you.

… something like … don’t wait for acknowledgment.

Another thing is to throttle the writing (batching). I’ll leave here an example:

const runBatch = () => {
  for (let i = 0; i < 5000; i++) {
    (j => {
      setTimeout(() => {
         // Write here in smaller batches.
      }, j * 1000) //  1 seconds per call

Can this just be that your mongo cluster is busy and that it takes times for queries to return so that the event loop is busy.

Do you write your updates with your nodejs app to the secondary node of mongo cluster ?

Hello everyone and thank you for your replies!

Sorry for my late reply!
Yes this was indeed a typo - we ARE on mongoDb Atlas on 3x M30 instances on AWS.
We already have an adapted mongodb connection string with w=majority&readPreference=secondaryPreferred - so we already optimized everything “there is”.

Also, the big writes already happen in UnorderedBulkOprations, so I don’t think there is not much we can still do there?! Also, again: these bulk operations are done by a worker nodejs app which is running an a completely separate AWS server.

Last but not least: we do restrict our connection pools to "maxPoolSize": 60 PER Meteor instance. So yeah … could be an issue but really not sure if this is the case. Since the spikes I showed you were RIGHT AFTER a go-live of a new version, all Meteor instances just restarted minutes before that, so it is highly unlikely that we already reached a limit of connections at that point.

Any other ideas from anyone?! :slight_smile:

best, Patrick