Consistently running out of memory causing restart

elie · June 17, 2020, 1:19pm

We’ve recently started noticing major issues with our production app. The app instances are constantly restarting due to being out of memory.
Meteor 1.9.3. Anyone have suggestions for debugging the cause of this?
Each instance is on an AWS t3a.medium with 4gb of memory. When it was on t3a.small would happen even more but it’s still happening even after upgrade.

[ip]time="2020-06-17T12:57:06Z" level=info msg="Started HTTP server." address="[::]:3000" usingNamedPipe=false
[ip]time="2020-06-17T12:57:06Z" level=info msg="Engine proxy started." build=2018.06-20-gc0e4bb519 version=1.1.2
[ip](node:1) [DEP0066] DeprecationWarning: OutgoingMessage.prototype._headers is deprecated
[ip]time="2020-06-17T12:57:37Z" level=info msg="Query response cache observed new Vary header in GraphQL response. Future cache operations will separate requests with different values for this header, and will miss previously cached responses to requests with this header." header=Accept-Encoding
[ip]FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
[ip]
[ip]<--- Last few GCs --->
[ip]
[ip][1:0x43fce70]   110086 ms: Scavenge 973.5 (989.8) -> 968.3 (990.0) MB, 7.5 / 0.0 ms  (average mu = 0.206, current mu = 0.128) allocation failure 
[ip][1:0x43fce70]   110569 ms: Mark-sweep 973.9 (990.0) -> 968.7 (994.0) MB, 475.2 / 0.0 ms  (average mu = 0.148, current mu = 0.077) allocation failure scavenge might not succeed
[ip]
[ip]
[ip]<--- JS stacktrace --->
[ip]
[ip]==== JS stack trace =========================================
[ip]
[ip]    0: ExitFrame [pc: 0x1374fd9]
[ip]Security context: 0x1b0f4de808a1 <JSObject>
[ip]    1: /* anonymous */ [0x38b269013fc9] [/built_app/programs/server/packages/ejson.js:~679] [pc=0x3910cd159ebd](this=0x00b5b8a0b881 <Object map = 0x6c31fbe5e11>,0x1d046d48c2f9 <Object map = 0xa06db4f111>)
[ip]    2: /* anonymous */ [0x1d046d4a07d1] [/built_app/programs/server/packages/ejson.js:~730] [pc=0x3910cd31b1a9](this=0x212c93a02261 <JSGlobal Object>,0x2283b431...
[ip]
[ip]
[ip]Writing Node.js report to file: report.20200617.125848.1.0.001.json
[ip]Node.js report completed
[ip] 1: 0x9da7c0 node::Abort() [node]
[ip] 2: 0x9db976 node::OnFatalError(char const*, char const*) [node]
[ip] 3: 0xb39f1e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
[ip] 4: 0xb3a299 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
[ip] 5: 0xce5635  [node]
[ip] 6: 0xce5cc6 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
[ip] 7: 0xcf1b5a v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
[ip] 8: 0xcf2a65 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
[ip] 9: 0xcf5478 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
[ip]10: 0xcbbda7 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType) [node]
[ip]11: 0xff1e0b v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]

andregoldstein · July 13, 2020, 8:02am

@elie Did you ever resolve this as we are having very similar problems with a production app too. Thanks

elie · July 13, 2020, 11:24am

I upgraded my EC2 instances and I don’t think it’s happening anymore. On 4GB RAM instead of 2GB (t3a.medium).
I also turned off Apollo caching. I think that may have been the culprit.

andregoldstein · July 13, 2020, 11:31am

Thanks! Were you finding that the majority of the time there was ample memory but it would just spike crazily from time to time? My app carries on happily with lots to spare for the most part… I’m not using Apollo so not sure if it’s a subscription issue perhaps

elie · July 13, 2020, 11:47am

I’ve only encountered CPU spikes. Not RAM spikes. What I usually see is a gradual increase of RAM over time (and when more users are connected)