We are still in the process of migrating our medium-sized (?) app to Meteor 3 but got derailed last month when deploying the Meteor 3 branch build to our staging environments (Kubernetes & ECS Fargate).
Most containers would go into a loop directly upon startup and spew stack traces like these multiple times per second:
MongoTopologyClosedError: Topology is closed
at processWaitQueue (/opt/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/src/sdam/topology.ts:918:42)
at Topology.selectServer (/opt/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/src/sdam/topology.ts:601:5)
at tryOperation (/opt/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/src/operations/execute_operation.ts:190:31)
at executeOperation (/opt/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/src/operations/execute_operation.ts:109:18)
at runNextTicks (node:internal/process/task_queues:65:5)
at listOnTimeout (node:internal/timers:555:9)
at processTimers (node:internal/timers:529:7)
at FindCursor._initialize (/opt/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/src/cursor/find_cursor.ts:72:22)
at FindCursor.cursorInit (/opt/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/src/cursor/abstract_cursor.ts:727:21)
at FindCursor.fetchBatch (/opt/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/src/cursor/abstract_cursor.ts:762:6)
at FindCursor.next (/opt/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/src/cursor/abstract_cursor.ts:425:7)
The containers could still respond to http requests but the DDP communication became unresponsive so no method calls or subscriptions could be used.
After a lot of investigation we found the root cause and a solution today.
First some background:
When a Meteor starts up, one of the first things it creates is the oplog tailer
After the async method _startTailing has been unleashed, all the other app startup code executes and in our case it took about 90 seconds with very little idle time for the mongodb driver to connect to the cluster as part of the _startTailing implementation and calls to “tail”.
So what we found was that the oploghandle’s connection would never connect properly because of the default 30 second server selection timeout, and meteor’s oploghandle implementation cannot recover from that.
The log messages that were emitted came from a neverending retry loop in a Meteor.defer call.
Anyhow, the simple cure was just to set a large enough serverSelectionTimeoutMS mongodb connection property via Meteor.settings (METEOR_SETTINGS):
{
"public": {
"packages": {
"dynamic-import": {
"useLocationOrigin": true
}
}
},
"packages": {
"mongo": {
"options": {
"serverSelectionTimeoutMS": 120000
}
}
}
}