Meteor's performance suddenly not predictable anymore

klabauter · June 17, 2025, 7:35am

Hello everyone,

With a very recent release of ours, we now encounter a VERY strange behavior of Meteor we have never seen before.
Since these changes by us, the performance of our app is suddenly not deterministic anymore!

Meaning: We deploy our branch with our changes to our server to one of our docker containers - the response times are fine! Now, if we deploy exactly the same changes to one of our other docker containers, OR we docker restart the existing container that was just quick moments ago, the response times suddenly tank and we are over twice or almost 3x slower.

We are completely out of ideas - we make NO changes to our setup, and the whole response times (and in turn the CPU load on our server!) behave all over the place.

Does anyone have any ideas on this? We can really validate this with our monti setup, that the simple docker restart made everything slower over 2x.
We are running this on our testing server, so there are NO sideeffects, no random users accessing it, … we basically “keep all variables stable” as best as we can!

One more thing:
The server is THAT busy during these times, that even serving a simple 3KB svg file from our server can take up to 1.6 seconds. So we had to completely revert these changes on production, because we could not handle our usual load anymore - the CPU spikes on our node processes just went crazy!

EDIT: You can even see here the crazy compute time for our methods in monti:

Please help!

Thanks, best
Patrick

minhna · June 17, 2025, 8:48am

do you mean you have no problem if you run ONE docker container, but if you run TWO dockers, something bad happens

klabauter · June 17, 2025, 9:40am

No, the total number of docker containers is stable.
Even if I spin up only one container and then do a restart, the performance seemingly totally changes.
Again, we have no idea what is going on!

The changes in question are almost only clientside, we already debugged all method calls and stuff and the number of these calls did not change - so we are completely out of ideas what causes these CPU spikes slowing everything down!

Sidenote: both our production server and our staging servers are affected by this.
So we just do not understand what seems to be happening in the background here of some sort, that causes the CPU issues.

important: The CPU spikes are always triggered by a user! In other words - just keeping the server running, the CPU is just fine, so nothing just happens in the background.
Whenever a user accesses the page, we see these huge spikes and also the loading times are twice as bad - so it definitely is triggered by the page view, whatever it is.

paulishca · June 17, 2025, 11:16am

Are you able to reproduce this by running the docker on a local machine?

It would be good to see some CPU profiling and docker stats.

grubba · June 17, 2025, 12:49pm

What version are you using? Are you doing something crazy with MongoDB/changed something there?

when this started?

klabauter · June 17, 2025, 1:07pm

We did primarily client side updates.
No meteor update, basically no package updates, no node updates, no other updates … nothing!

MongoDB is fine, we checked and the DB is not at fault here, it’s the node process in some way.

klabauter · June 17, 2025, 1:49pm

Is the following, in your opinion, possible?

We removed a huge clientside library (antd) → our main .js client package is now much smaller, it should load faster.
Due to this, maybe, some methods can be called in a quicker succession (because basically all of our methods use this.unblock() - does this even still exist now with Promises? we use Meteor 3.2!)
this, in turn, leads to a higher load on the server, although the amount of data remains the same, because the methods are “called quicker”.

We have now been on this topic for 5 business days with 3+ developers, and we can’t think of anything else anymore … We are really desperate at this point!

rjdavid · June 17, 2025, 2:26pm

Have you tested rolling back to a version where this is not happening? This can help you identify the change that might have resulted in your current issue.

klabauter · June 17, 2025, 2:31pm

@rjdavid Of course, this is exactly what we have been doing!

The old version is still “stable”, then we tried to add code changes file per file.
Then, quite randomly, it seems to get slow again.

The weirdest part remains:
When we reach branch X that contains Y changes from our “broken branch”, and this branch X works fine on our docker container, sometimes, a simple docker restart is enough to break it. This makes zero sense to me, but I promise you that this is the case!

We just restart the same container, so by definition no changes to the code, the image, the JS bundle, nothing … and suddenly the page load times double or tripple and the CPU goes crazy again.

rjdavid · June 17, 2025, 2:36pm

No clues on the trace provided by the apm?

klabauter · June 17, 2025, 3:14pm

Not really … all we can see I think is that “computation” took up to 2 seconds.
We think, this is due to just the CPU being overloaded at that time, making everything slow.

paulishca · June 17, 2025, 4:12pm

I have a Meteor 3 compatible version of Meteor Down. With this you have control over methods, calling them one by one until you eventually identify a problem.

It looks like this:

meteorDown.init(Meteor => {
  Meteor.call('getUserExtrasMethod', {}, (err, res) => { // your method
    if (err) {
      console.log(err)
    } else {
      Meteor.kill()
    }
  })
})

meteorDown.run({
  concurrency: 20, // your connurency
  key: '****', // a Meteor Down password that authenticates your user
  // url: "http://192.168.1.72:3000",
  url: 'https://www.your_live_url.com',
  auth: { userIds: ['xxxxxxx'] } // impersonate your users with the user _id
})

klabauter · June 18, 2025, 8:13am

Thanks!

The thing is: We do not think this is “only method call related”!
We DID debug the number of calls to our different methods and how long they take.

But we might give it a try!

a4xrbj1 · June 18, 2025, 9:48am

Paul,

have you published that package? I can only find that old (11 years old) version.

Great work to update that package!

Regards,

Andreas

paulishca · June 18, 2025, 11:38am

It is not published. It consists of an NPM package (which is now private on Github) and a Meteor Package. Ok, good point, I should publish it.

a4xrbj1 · June 18, 2025, 2:49pm

Thanks a lot for your contribution and help!

italojs · June 23, 2025, 1:30pm

I know it’s a private project, but do you have a small reproducible app with this issue? It will help us A LOT to debug and fix it.