Meteor v2.8 - Memory leak in mongo driver?

klabauter · November 23, 2022, 7:39am

Hi everyone!

Last week we upgraded from v2.5.8 to v2.8.0 on our production instance of https://orderlion.com.
Almost immediately the next day, we had our RAM usage explode, resulting in “JS heap out of memory” another day later and forcing the docker container(s) to restart. (we had about +1GB of RAM usage per container per hour! - totally crazy!)

We did some analysis via V8 heap dumps and also via Monti and found the following clues:

We discovered an extensive amount of ClientSessions and even some circular references which could prevent garbage collection of unused ClientSessions. The amount of ClientSessions and therefore also the allocated memory is growing substantially even after a few hours of being live in production.

Also, for some reason, the new mongo driver allows way more primary connection pools (10 → 100), which could also bump up the RAM usage even more.

You can see the crazy RAM usage of 247 MB for the MongoDB client sessions - this is a B2B tool, with a maximum of ~150 users online at the same time, so this sounds very crazy to me?

Here you can see the jump back from 100 to 10 mongo connection pools again, because we shipped a downgrade! - we had to roll back our update, as this big pretty much makes this impossible to use for us on production.

Did anyone run into similar problems? I think you guys really need to investigate this! (We also posted the same issue in the MongoDB Jira: https://jira.mongodb.org/browse/NODE-4833)

best, Patrick

xet7 · November 24, 2022, 10:21pm

WeKan Open Source kanban https://wekan.github.io GitHub - wekan/wekan: The Open Source kanban (built with Meteor). Keep variable/table/field names camelCase. For translations, only add Pull Request changes to wekan/i18n/en.i18n.json , other translations are done at https://transifex.com/wekan/wekan only. uses Meteor 2.8.1 and also has growing RAM and CPU usage. I don’t know can someone help to fix memory leaks etc. Build from source info is at Emoji · wekan/wekan Wiki · GitHub

klabauter · November 25, 2022, 7:05am

Little update from our side:

We now re-upgraded to 2.8.1 and also limited the connection pools in our settings.json like so:

  "packages": {
    "mongo": {
      "options": {
        "maxPoolSize": 15,
        "socketTimeoutMS": 600000
      }
    }
  }

We still see severe RAM usage, although we THINK it got a bit better. But still we will have to docker restart both our containers on a daily basis now - otherwise we just run out of RAM.

Here you can see the RAM growth during last night (the spike at 6pm yesterday was our release) - it grew from ~35% to 52% in just 14 hours during the night, where there are obviously next to no users online/using the app!

Dear Meteor dev team: please have a look at this, this really is a severe problem we need to somehow fix together!

xet7 · November 25, 2022, 7:11am

Added issue to Meteor v2.8 - Memory leak in mongo driver? · Issue #12321 · meteor/meteor · GitHub

Longmate · November 25, 2022, 8:34am

We are also experiencing similar problem since upgrading to Meteor v2.8. I have not managed to do any detailed investigation yet.

paulishca · November 25, 2022, 6:25pm

Do you get deprecation notice for socketTimeoutMS. I remember it has been sent to the history of computing a long time ago. It could be that you don’t even get a deprecation message anymore. I’d suggest to check if that exists on your version of Mongo Driver.

Here’s an example of parameters for the current driver (might have used this since at least Meteor 2.7):

xxx.mongodb.net/meteor?retryWrites=true&w=majority&useUnifiedTopology=true&heartbeatFrequencyMS=15000

github.com/Automattic/mongoose

server/replset/mongos options are deprecated => documentation?

opened 01:52PM - 05 Jul 17 UTC

closed 05:55PM - 23 Jul 17 UTC

peterpeterparker

docs

**Do you want to request a *feature* or report a *bug*?** Documentation enhan…cement **What is the current behavior?** Using Mongoose 4.11.1 I get following deprecation warning: > the server/replset/mongos options are deprecated, all their options are supported at the top level of the options object [poolSize,ssl,sslValidate,sslCA,sslCert,sslKey,sslPass,sslCRL,autoReconnect,noDelay,keepAlive,connectTimeoutMS,socketTimeoutMS,reconnectTries,reconnectInterval,ha,haInterval,replicaSet,secondaryAcceptableLatencyMS,acceptableLatencyMS,connectWithNoPrimary,authSource,w,wtimeout,j,forceServerObjectId,serializeFunctions,ignoreUndefined,raw,promoteLongs,bufferMaxEntries,readPreference,pkFactory,promiseLibrary,readConcern,maxStalenessSeconds,loggerLevel,logger,promoteValues,promoteBuffers,promoteLongs,domainsEnabled,keepAliveInitialDelay,checkServerIdentity,validateOptions] **What is the expected behavior?** Following the warning, I was searching the documentation about how to resolve this warning (http://mongoosejs.com/docs/connections.html#use-mongo-client) but didn't found anything about it aka anything about how I should migrate these options **Please mention your node.js, mongoose and MongoDB version.** Node.js v8.1.3 Mongoose 4.11.1 Mongodb v3.2.14

paulishca · November 25, 2022, 6:27pm

I am just linking the other discussion here since they are the same subject: Massive RAM usage with only ~150 concurrent users

klabauter · November 25, 2022, 7:39pm

Yes, the settings work. I don’t know why you think these are deprecated, they are still well documented in the official mongodb node driver: MongoOptions | mongodb

We are absolutely sure that “something big” changed with the update of the MongoDB driver, as the RAM usage just exploded, as outlined above, with a lot of ClientSessions in RAM which just grows and grows.

Does anyone have tipps on how to find cursors which, apparently, can’t be closed and thus mean that the according ClientSession stays open indefinitely (resulting in our RAM explosion)?

rjdavid · November 27, 2022, 9:51am

Here is the MR for the update in MongoDB Driver

github.com/meteor/meteor

Updated MongoDB driver to 4.8

meteor:release-2.7.4 ← radekmie:mongo-driver-4.8

opened 12:01PM - 26 Jul 22 UTC

radekmie

+83 -148

In this pull request, I updated the MongoDB driver to version 4.8. As mentioned …in [here](https://github.com/meteor/meteor/discussions/12092#discussioncomment-3198339), the `connect` operation is no longer mandatory, making it possible to make all asynchronous constructors listed in https://github.com/meteor/meteor/pull/12028 synchronous again. The only problem right now is that the `MongoConnection` no longer throws an error (nor rejects) when there's a connection problem, making [this test](https://github.com/radekmie/meteor/blob/99e93bde4168a4c287fe371908dc02aae2fbf33d/packages/mongo/mongo_livedata_tests.js#L3232-L3239) fail. I see a few options to handle that: 1. **Ignore the error completely.** That means, we silently ignore it, making all of the subsequent operations on the connection fail. It's not a huge problem, as the connection may break at anytime anyway, and all errors have to be handled properly. 2. **Expose `.connectionPromise`.** That means, we actually call `.connect`, won't wait for the result - simply store the promise on an instance. This way everyone can `await` it if needed. * This is doable in the userspace, as [it's fine to call `.connect` multiple times](https://github.com/mongodb/node-mongodb-native/blob/be34a94651c2b18df303f248e10b1bfa06dc445e/src/operations/connect.ts#L17-L20). 3. Something else...? **EDIT 2022-07-28:** I chose 1. for now.

@radekmie might be able to give a clue if there is anything that might have caused a circular reference with instantiating a ClientSession as indicated in the heap dump above

paulishca · November 28, 2022, 5:58am

you are right, those are not deprecated, we just removed them because for us, they have suitable defaults.
How about you try this on your connection options: https://www.mongodb.com/docs/manual/reference/connection-string/#mongodb-urioption-urioption.maxIdleTimeMS

klabauter · November 28, 2022, 8:18am

We can try maxIdleTimeMS, but the problem I see is: The leak does NOT originate from too many connections or connections staying open/not being closed - the problem are the Sessions that are never being “freed” and never being deleted.

klabauter · November 28, 2022, 8:39am

Thanks for the link!

I just analyzed our heapdump again and it seems to me, that the ClientSessions contain pretty much everything!
You can find references to the Grapher package, the Accounts package, … you name it.

Maybe this is expected behavior, I don’t know, it is very hard for me to follow any logic here. What is definitely true is, that these ClientSessions seem to be never cleaned up and thus make the RAM grow indefinitely.

znewsham · November 28, 2022, 5:08pm

I was potentially able to reproduce this this morning:

App.Mongo.db.s.client.s.activeSessions.size
// 6
_.times(1000, () => Meteor.users.find().count())
App.Mongo.db.s.client.s.activeSessions.size
// 1006

the memory impact itself was tiny though - running the above 4 times (in different variations) - e.g., 4000 extraneous active sessions only led to a 25MB increase in memory

Looks like there is an issue specifically with count - the same happens with Meteor.users.rawCollection() - so it smells like an issue with the underlying mongo driver.

It’s worth noting I’m running an EXTREMELY customised version of meteor - reproing on the base 2.8.1 would be useful (and should be easy). I also didn’t try running this pre 2.8 to see if it’s pre-existing

I’d bet money on the problem being that count (which is now deprecated) doesn’t close the session

App.Mongo.db.s.client.s.activeSessions.size
// 4008
_.times(1000, () => Meteor.users.rawCollection().find())
App.Mongo.db.s.client.s.activeSessions.size
// 5008

klabauter · November 28, 2022, 8:55pm

Thank you @znewsham , we will have a look!

We also just got a reply from mongo in our post here: https://jira.mongodb.org/browse/NODE-4833

there are two variations of sessions in the Node driver. Every operation that gets sent to the server has a session attached - we represent these sessions with ServerSessions. The driver manages ServerSessions in a ServerSessionPool and will clean up stale server sessions when they expire. ClientSessions are an abstraction over ServerSessions that allow users of the driver to provide a session for operations. The driver does not manage ClientSessions - users are responsible for ending them using the endSession method.

We allow users to create sessions using the MongoClient’s startSession method (source here). This method ensures that when endSessions is called, we remove the client session from the ActiveSession set.

The reported bug is a buildup of ClientSessions. My suspicion (without knowing how Meteor works) is that Meteor is creating client sessions under the hood but never ending the sessions, resulting in a buildup of sessions in the driver.

Does anybody have any infos on this?

What totally confuses me: I checked both the meteor (packages) codebase AND the meteor grapher codebase: In neither (besides one single test) could I find a call of startSession() … wtf is going on? What am I missing here?

znewsham · November 28, 2022, 9:07pm

if you don’t provide a session, the driver creates one automatically - this is cleaned up anytime a cursor is invoked (and ran to completion) but not when a count is called - I also wonder about things like distinct or index creation that also wouldn’t use a cursor

klabauter · November 28, 2022, 9:32pm

Agreed - is this the code in question? node-mongodb-native/execute_operation.ts at main · mongodb/node-mongodb-native · GitHub

So, it seems to me that, for some reason, for some operations this is not handled properly by the mongo driver? Maybe especially aggregations are affected heavily (which are used A LOT by Grapher) and that’s why only some Meteor users are affected by this bug?!

In any case, we really need a fix for this asap! Thanks everyone again for all your help and input!

znewsham · November 28, 2022, 9:52pm

so that is the code that start’s the session, but not what ends it (but it probably should be what ends it) aggregations (at least on a rawCollection) don’t trigger this - probably because the result is a cursor.

Aside: I thought grapher didnt use aggregations - but used hypernova for a custom lookup algo?

FYI I checked aggregation, createIndex and distinct - none of them had this issue…

znewsham · November 28, 2022, 10:04pm

Ok, so technically this is a bug in the mongo driver (rawCollection.find().count() does leak the session) - but since that’s deprecated anyway they probably don’t care.

The issue is that meteor uses rawCollection().countDocuments({}) - BUT they use it after creating the cursor - so the fault is with meteor

rjdavid · November 28, 2022, 11:07pm

Do you mean:

Cursor is created using find() - therefore session is created
countDocuments() is used
Cursor left unused - therefore session is not cleaned up automatically

znewsham · November 28, 2022, 11:31pm

Exactly. To be clear, this is certainly leaking sessions. It’s not obvious that this is the root cause of the memory leak (in my tests the memory leak was pretty modest)

I posted on the github issue too.