Meteor v2.8 - Memory leak in mongo driver?

We can try maxIdleTimeMS, but the problem I see is: The leak does NOT originate from too many connections or connections staying open/not being closed - the problem are the Sessions that are never being “freed” and never being deleted.

Thanks for the link!

I just analyzed our heapdump again and it seems to me, that the ClientSessions contain pretty much everything! :smiley:
You can find references to the Grapher package, the Accounts package, … you name it.

Maybe this is expected behavior, I don’t know, it is very hard for me to follow any logic here. What is definitely true is, that these ClientSessions seem to be never cleaned up and thus make the RAM grow indefinitely.

1 Like

I was potentially able to reproduce this this morning:

App.Mongo.db.s.client.s.activeSessions.size
// 6
_.times(1000, () => Meteor.users.find().count())
App.Mongo.db.s.client.s.activeSessions.size
// 1006

the memory impact itself was tiny though - running the above 4 times (in different variations) - e.g., 4000 extraneous active sessions only led to a 25MB increase in memory

Looks like there is an issue specifically with count - the same happens with Meteor.users.rawCollection() - so it smells like an issue with the underlying mongo driver.

It’s worth noting I’m running an EXTREMELY customised version of meteor - reproing on the base 2.8.1 would be useful (and should be easy). I also didn’t try running this pre 2.8 to see if it’s pre-existing

I’d bet money on the problem being that count (which is now deprecated) doesn’t close the session

App.Mongo.db.s.client.s.activeSessions.size
// 4008
_.times(1000, () => Meteor.users.rawCollection().find())
App.Mongo.db.s.client.s.activeSessions.size
// 5008

Thank you @znewsham , we will have a look!

We also just got a reply from mongo in our post here: https://jira.mongodb.org/browse/NODE-4833

there are two variations of sessions in the Node driver. Every operation that gets sent to the server has a session attached - we represent these sessions with ServerSessions. The driver manages ServerSessions in a ServerSessionPool and will clean up stale server sessions when they expire. ClientSessions are an abstraction over ServerSessions that allow users of the driver to provide a session for operations. The driver does not manage ClientSessions - users are responsible for ending them using the endSession method.

We allow users to create sessions using the MongoClient’s startSession method (source here). This method ensures that when endSessions is called, we remove the client session from the ActiveSession set.

The reported bug is a buildup of ClientSessions. My suspicion (without knowing how Meteor works) is that Meteor is creating client sessions under the hood but never ending the sessions, resulting in a buildup of sessions in the driver.

Does anybody have any infos on this?

What totally confuses me: I checked both the meteor (packages) codebase AND the meteor grapher codebase: In neither (besides one single test) could I find a call of startSession() … wtf is going on? What am I missing here?

if you don’t provide a session, the driver creates one automatically - this is cleaned up anytime a cursor is invoked (and ran to completion) but not when a count is called - I also wonder about things like distinct or index creation that also wouldn’t use a cursor

Agreed - is this the code in question? node-mongodb-native/execute_operation.ts at main · mongodb/node-mongodb-native · GitHub

So, it seems to me that, for some reason, for some operations this is not handled properly by the mongo driver? Maybe especially aggregations are affected heavily (which are used A LOT by Grapher) and that’s why only some Meteor users are affected by this bug?!

In any case, we really need a fix for this asap! Thanks everyone again for all your help and input!

1 Like

so that is the code that start’s the session, but not what ends it (but it probably should be what ends it) aggregations (at least on a rawCollection) don’t trigger this - probably because the result is a cursor.

Aside: I thought grapher didnt use aggregations - but used hypernova for a custom lookup algo?

FYI I checked aggregation, createIndex and distinct - none of them had this issue…

Ok, so technically this is a bug in the mongo driver (rawCollection.find().count() does leak the session) - but since that’s deprecated anyway they probably don’t care.

The issue is that meteor uses rawCollection().countDocuments({}) - BUT they use it after creating the cursor - so the fault is with meteor

Do you mean:

  1. Cursor is created using find() - therefore session is created
  2. countDocuments() is used
  3. Cursor left unused - therefore session is not cleaned up automatically

Exactly. To be clear, this is certainly leaking sessions. It’s not obvious that this is the root cause of the memory leak (in my tests the memory leak was pretty modest)

I posted on the github issue too.

2 Likes

No, I don’t have any. But I saw an extensive discussion here and on MongoDB Jira already, so I guess the problem is (most likely) already identified.

Thanks for that! I quickly checked the mongo package sources and I think it won’t be hard to delay cursor creation up until it’s needed. We could try that and see if it’s still a problem.

However, I’m super curious if something as simple as .find() without consuming it is also a problem with the driver itself. If it does, and it wasn’t the case in the older versions, IMO it’s something they should fix (or, at least, document properly).

Find without consuming certainly triggers the issue here (in 2.8.1) but I’ve confirmed that on 2.5.7-beta.0 (our production version) it does NOT. So this does look like an API change in mongo - I doubt they’ll be interested in fixing it though since it’s pretty unusual to create a cursor that you don’t then use - but yes, better documentation of this change would be nice

created this PR: Make count NOT create a cursor by znewsham · Pull Request #12326 · meteor/meteor · GitHub

4 Likes

Thanks a lot everyone for all your quick support and even the PR just created hours later - great stuff! I love this community! :heart:
Fingers crossed this also fixes our RAM problems - we are currently already deploying a hotfix ourselves:

Mongo.Collection.prototype.countDocuments = function (query, options) {
  const coll = this.rawCollection();
  return Meteor.wrapAsync(coll.countDocuments.bind(coll))(query, options);
};

We will give you an update if we still run into any issues.

1 Like

Supposedly fixed in 2.8.2 from the PR of @znewsham :tada: Make count NOT create a cursor by znewsham · Pull Request #12326 · meteor/meteor · GitHub

3 Likes

Update from our side:
After the hotfix deployed yesterday, our RAM problem indeed got a lot better! :slight_smile:
BUT: It seems we are still leaking (-> not closing) Mongo Sessions somewhere!

We implemented a method to log activeSessions like so:

const client = MongoInternals.defaultRemoteCollectionDriver()?.mongo?.client;
const sessionsCount = client?.s?.activeSessions?.size;

This is number is still climbing ALL the time and NOT going down!
It has been 5 hours since the last restart of this instance and with peaks of only ~100 concurrent users, we are already at 14k (!!) active sessions - and still climbing (although slowly).
On the 2nd instance (also serving the same users - we have a traefik loadbalancer in front of it), we are already at 17k. So it basically means that this load over the last 5 hours managed to open and not close a total of 31k Mongo Sessions!

Can anyone still confirm this? It seems to me there might be other bugs/changes in the new mongo driver we missed?

We are, obviously, running Meteor v2.8.1 and we patched ALL .find().count() in our codebase with the small patch I posted here.


Update:

I think we found the problem! It is in the very common package meteor-roles … and OF COURSE it is .find().count() again! :grin: look here: meteor-roles/roles_common.js at master · Meteor-Community-Packages/meteor-roles · GitHub

Update 2: And indeed - we found it!!
After an update to Meteor 2.8.2 the problem is solved!
We can confirm that v2.8.2 solves the issue!! Good job everyone!
Now our Sessions stay at a solid 0! Heureka!! :grin:

10 Likes

Great job guys identifying and fixing an important issue! Many thanks to everyone involved.

1 Like

Hello,
If we update to v2.8.2 do we have also to do a patch like you did (or updating to v2.8.2 is enough) ?

1 Like

What it seems is yes. Updating to 2.8.2 can solve.
Awesome to see guys! Also, if you guys want to upgrade to the latest mongo driver version there is one rc candidate out for 2.9.

No need for our custom patch, updating to 2.8.2 should be sufficient!

4 Likes