How do you debug a 'RangeError: Out of memory' error

hemalr87 · March 10, 2022, 9:30am

We’ve been stuck in a loop of server restarts, all caused by RangeError: Out of memory. Looking through the logs, they don’t seem too helpful:


2022-03-10 14:57:30+08:00meteor://💻app/packages/mdg_meteor-apm-agent.js:3719
2022-03-10 14:57:30+08:00 return originalRun.call(this, val);
2022-03-10 14:57:30+08:00 ^
2022-03-10 14:57:30+08:00
2022-03-10 14:57:30+08:00RangeError: Out of memory
2022-03-10 14:57:30+08:00 at Fibers.run (packages/mdg:meteor-apm-agent/lib/hijack/async.js:25:22)
2022-03-10 14:57:30+08:00 at Object.Kadira._sendPayload (packages/mdg:meteor-apm-agent/lib/kadira.js:177:6)
2022-03-10 14:57:30+08:00 at process.<anonymous> (packages/mdg:meteor-apm-agent/lib/hijack/error.js:21:12)
2022-03-10 14:57:30+08:00 at process.emit (events.js:326:22)
2022-03-10 14:57:30+08:00 at process.EventEmitter.emit (domain.js:483:12)
2022-03-10 14:57:30+08:00 at process._fatalException (internal/process/execution.js:165:25)

I doubt this is an issue with Kadira. More likely it is just being logged by Kadira, would that make sense? We had a corresponding alert on Mongo for Disk I/O % utilization going above 90, which seems related.

I know this is entirely application based, but I’m feeling a bit clueless on how to debug this. Anyone have any pointers/know of any specific resources to look at?

Many thanks in advance!

sarojmoh1 · March 10, 2022, 2:09pm

I’ve had couple posts here about similar error. These are really tough to debug (especially if it’s sporadic and non-reproducible locally).

From my experience, mem leak is probably due to some expensive subscription(s) going on that are using a lot of observers. If Mongo server disk usage went up to 90%…then this is a telling sign it’s something with that as opposed to just some bug in the code.

Maybe Kadira can give you some other insights around that timestamp.

jamgold · March 10, 2022, 4:00pm

Long time ago I had a very similar problem and it had to do with a rather large mongodb and a query that was filtering on a field without an index, in which case the server had to load the entire db to filter. Adding an index to the appropriate field fixed the issue for me

rjdavid · March 10, 2022, 4:14pm

Can be a number of possible reasons. This might be worth a try

hemalr87 · March 11, 2022, 3:01am

Ah ye. Thanks for the suggestion.

Someone else (not in this thread) recommended heapdump - npm - is there any reason to use one over the other?

hemalr87 · March 11, 2022, 3:02am

I’ve tried to ensure all queries are indexed. The Atlas profiler doesn’t bring up any non-indexed queries (that I can see). Thanks for the suggestion.

hemalr87 · March 11, 2022, 3:06am

These are really tough to debug (especially if it’s sporadic and non-reproducible locally).

This is the kryptonite of debugging!

From my experience, mem leak is probably due to some expensive subscription(s) going on that are using a lot of observers. If Mongo server disk usage went up to 90%…then this is a telling sign it’s something with that as opposed to just some bug in the code.

Yup I think it is more than just a coincidence. Need to figure out exactly which subs are doing this. I do have an inbox where each document is subscribed to by a user. Not much data by itself BUT it’s coupled with an infinite loading mechanism that increases the number of documents. I wonder whether that may be the culprit here and whether pagination may be a better mechanism.

Maybe Kadira can give you some other insights around that timestamp.

Kadira logs all look normal (had the guys at Galaxy chip in with their 2c as well on this). But I may not be reading things right. I wish Arunoda’s consultancy was still around!

rjdavid · March 11, 2022, 8:03am

This is specifically tailored for out of memory problems. The “Why” part from this page explains everything GitHub - blueconic/node-oom-heapdump: Create a V8 heap snapshot right before an "Out of Memory" error occurs, or create a heap snapshot or CPU profile on request.

hemalr87 · April 26, 2022, 8:25am

An interim update on this:

As a “quick fix”, we threw some money at the problem and doubled our Mongo Atlas cluster from M10 to M20 which seems to have stopped the problem.
This quick fix plus the Mongo charts and speaking to their technical team seemed to indicate that the number of IOPS were the cause of the issue. This was causing a bottleneck and, from my understanding, it was cascading into causing a reduction in RAM, slow responses and ultimately, the Meteor servers resending requests thinking requests had failed. This leads to a spiral as it increases the load on Mongo even more.

Fixing things going forward:

Reduce the number of subscriptions even more (though they were already low on our app to begin with). We can’t get rid of them completely as we do have some need for real time data but their use can be more judicious.
A concept about Mongo indexing that I learned that may help someone else -

When the query criteria and the projection of a query include only the indexed fields, MongoDB returns results directly from the index without scanning any documents or bringing documents into memory.

In our case, while all queries have been indexed, I did not realise that relying on index intersection: https://www.mongodb.com/docs/manual/core/index-intersection/ - can have a performance hit.

This is something I will need to consider and figure out going forward, but wish I had known from the start.