WeKan Open Source kanban https://wekan.github.io GitHub - wekan/wekan: The Open Source kanban (built with Meteor). Keep variable/table/field names camelCase. For translations, only add Pull Request changes to wekan/i18n/en.i18n.json , other translations are done at https://transifex.com/wekan/wekan only. uses Meteor 2.8.1 and also has growing RAM and CPU usage. I don’t know can someone help to fix memory leaks etc. Build from source info is at Emoji · wekan/wekan Wiki · GitHub
Little update from our side:
We now re-upgraded to 2.8.1
and also limited the connection pools in our settings.json
like so:
"packages": {
"mongo": {
"options": {
"maxPoolSize": 15,
"socketTimeoutMS": 600000
}
}
}
We still see severe RAM usage, although we THINK it got a bit better. But still we will have to docker restart
both our containers on a daily basis now - otherwise we just run out of RAM.
Here you can see the RAM growth during last night (the spike at 6pm yesterday was our release) - it grew from ~35% to 52% in just 14 hours during the night, where there are obviously next to no users online/using the app!
Dear Meteor dev team: please have a look at this, this really is a severe problem we need to somehow fix together!
We are also experiencing similar problem since upgrading to Meteor v2.8. I have not managed to do any detailed investigation yet.
Do you get deprecation notice for socketTimeoutMS
. I remember it has been sent to the history of computing a long time ago. It could be that you don’t even get a deprecation message anymore. I’d suggest to check if that exists on your version of Mongo Driver.
Here’s an example of parameters for the current driver (might have used this since at least Meteor 2.7):
xxx.mongodb.net/meteor?retryWrites=true&w=majority&useUnifiedTopology=true&heartbeatFrequencyMS=15000
I am just linking the other discussion here since they are the same subject: Massive RAM usage with only ~150 concurrent users
Yes, the settings work. I don’t know why you think these are deprecated, they are still well documented in the official mongodb node driver: MongoOptions | mongodb
We are absolutely sure that “something big” changed with the update of the MongoDB driver, as the RAM usage just exploded, as outlined above, with a lot of ClientSessions
in RAM which just grows and grows.
Does anyone have tipps on how to find cursors which, apparently, can’t be closed and thus mean that the according ClientSession
stays open indefinitely (resulting in our RAM explosion)?
Here is the MR for the update in MongoDB Driver
@radekmie might be able to give a clue if there is anything that might have caused a circular reference with instantiating a ClientSession as indicated in the heap dump above
you are right, those are not deprecated, we just removed them because for us, they have suitable defaults.
How about you try this on your connection options: https://www.mongodb.com/docs/manual/reference/connection-string/#mongodb-urioption-urioption.maxIdleTimeMS
We can try maxIdleTimeMS
, but the problem I see is: The leak does NOT originate from too many connections or connections staying open/not being closed - the problem are the Sessions that are never being “freed” and never being deleted.
Thanks for the link!
I just analyzed our heapdump again and it seems to me, that the ClientSessions
contain pretty much everything!
You can find references to the Grapher package, the Accounts package, … you name it.
Maybe this is expected behavior, I don’t know, it is very hard for me to follow any logic here. What is definitely true is, that these ClientSessions
seem to be never cleaned up and thus make the RAM grow indefinitely.
I was potentially able to reproduce this this morning:
App.Mongo.db.s.client.s.activeSessions.size
// 6
_.times(1000, () => Meteor.users.find().count())
App.Mongo.db.s.client.s.activeSessions.size
// 1006
the memory impact itself was tiny though - running the above 4 times (in different variations) - e.g., 4000 extraneous active sessions only led to a 25MB increase in memory
Looks like there is an issue specifically with count - the same happens with Meteor.users.rawCollection()
- so it smells like an issue with the underlying mongo driver.
It’s worth noting I’m running an EXTREMELY customised version of meteor - reproing on the base 2.8.1 would be useful (and should be easy). I also didn’t try running this pre 2.8 to see if it’s pre-existing
I’d bet money on the problem being that count (which is now deprecated) doesn’t close the session
App.Mongo.db.s.client.s.activeSessions.size
// 4008
_.times(1000, () => Meteor.users.rawCollection().find())
App.Mongo.db.s.client.s.activeSessions.size
// 5008
Thank you @znewsham , we will have a look!
We also just got a reply from mongo in our post here: https://jira.mongodb.org/browse/NODE-4833
there are two variations of sessions in the Node driver. Every operation that gets sent to the server has a session attached - we represent these sessions with ServerSessions. The driver manages ServerSessions in a ServerSessionPool and will clean up stale server sessions when they expire. ClientSessions are an abstraction over ServerSessions that allow users of the driver to provide a session for operations. The driver does not manage ClientSessions - users are responsible for ending them using the
endSession
method.We allow users to create sessions using the MongoClient’s
startSession
method (source here). This method ensures that whenendSession
s is called, we remove the client session from the ActiveSession set.The reported bug is a buildup of ClientSessions. My suspicion (without knowing how Meteor works) is that Meteor is creating client sessions under the hood but never ending the sessions, resulting in a buildup of sessions in the driver.
Does anybody have any infos on this?
What totally confuses me: I checked both the meteor (packages) codebase AND the meteor grapher codebase: In neither (besides one single test) could I find a call of startSession()
… wtf is going on? What am I missing here?
if you don’t provide a session, the driver creates one automatically - this is cleaned up anytime a cursor is invoked (and ran to completion) but not when a count is called - I also wonder about things like distinct or index creation that also wouldn’t use a cursor
Agreed - is this the code in question? node-mongodb-native/execute_operation.ts at main · mongodb/node-mongodb-native · GitHub
So, it seems to me that, for some reason, for some operations this is not handled properly by the mongo driver? Maybe especially aggregations are affected heavily (which are used A LOT by Grapher) and that’s why only some Meteor users are affected by this bug?!
In any case, we really need a fix for this asap! Thanks everyone again for all your help and input!
so that is the code that start’s the session, but not what ends it (but it probably should be what ends it) aggregations (at least on a rawCollection) don’t trigger this - probably because the result is a cursor.
Aside: I thought grapher didnt use aggregations - but used hypernova for a custom lookup algo?
FYI I checked aggregation
, createIndex
and distinct
- none of them had this issue…
Ok, so technically this is a bug in the mongo driver (rawCollection.find().count()
does leak the session) - but since that’s deprecated anyway they probably don’t care.
The issue is that meteor uses rawCollection().countDocuments({})
- BUT they use it after creating the cursor - so the fault is with meteor
Do you mean:
- Cursor is created using
find()
- therefore session is created -
countDocuments()
is used - Cursor left unused - therefore session is not cleaned up automatically
Exactly. To be clear, this is certainly leaking sessions. It’s not obvious that this is the root cause of the memory leak (in my tests the memory leak was pretty modest)
I posted on the github issue too.
No, I don’t have any. But I saw an extensive discussion here and on MongoDB Jira already, so I guess the problem is (most likely) already identified.
Thanks for that! I quickly checked the mongo
package sources and I think it won’t be hard to delay cursor creation up until it’s needed. We could try that and see if it’s still a problem.
However, I’m super curious if something as simple as .find()
without consuming it is also a problem with the driver itself. If it does, and it wasn’t the case in the older versions, IMO it’s something they should fix (or, at least, document properly).