So I ran into this login issue again this morning and a few days ago. Both times on a local server and I had received MongoNetworkErrors before the issue started happening.
Here is the call stack:
I20181001-21:42:26.563(-4)? Exception in setInterval callback: MongoNetworkError: connection 67 to xxxxx.servers.xxxxxxxxx.com:27017 timed out
I20181001-21:42:26.587(-4)? at TLSSocket.<anonymous> (C:\Users\######\AppData\Local\.meteor\packages\npm-mongo\3.0.7\npm\node_modules\mongodb-core\lib\connection\connection.js:258:7)
I20181001-21:42:26.587(-4)? at Object.onceWrapper (events.js:313:30)
I20181001-21:42:26.587(-4)? at emitNone (events.js:106:13)
I20181001-21:42:26.588(-4)? at TLSSocket.emit (events.js:208:7)
I20181001-21:42:26.588(-4)? at TLSSocket.Socket._onTimeout (net.js:410:8)
I20181001-21:42:26.588(-4)? at ontimeout (timers.js:498:11)
I20181001-21:42:26.588(-4)? at tryOnTimeout (timers.js:323:5)
I20181001-21:42:26.588(-4)? at Timer.listOnTimeout (timers.js:290:5)
I20181001-21:42:26.617(-4)? => awaited here:
I20181001-21:42:26.618(-4)? at Function.Promise.await (C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\promise_server.js:56:12)
I20181001-21:42:26.619(-4)? at Promise.asyncApply (packages/mongo/mongo_driver.js:1034:11)
I20181001-21:42:26.620(-4)? at C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\fiber_pool.js:43:40
I20181001-21:42:26.621(-4)? => awaited here:
I20181001-21:42:26.622(-4)? at Promise.await (C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\promise_server.js:60:12)
I20181001-21:42:26.623(-4)? at SynchronousCursor._nextObject (packages/mongo/mongo_driver.js:1083:12)
I20181001-21:42:26.623(-4)? at SynchronousCursor.forEach (packages/mongo/mongo_driver.js:1097:22)
I20181001-21:42:26.637(-4)? at SynchronousCursor.map (packages/mongo/mongo_driver.js:1107:10)
I20181001-21:42:26.637(-4)? at SynchronousCursor.fetch (packages/mongo/mongo_driver.js:1131:17)
I20181001-21:42:26.640(-4)? at Cursor.(anonymous function) (packages/mongo/mongo_driver.js:879:44)
I20181001-21:42:26.640(-4)? at Cursor.cursorProto.(anonymous function) (packages\meteorhacks_kadira.js:3352:32)
I20181001-21:42:26.641(-4)? at Cursor.kadira_Cursor_fetch [as fetch] (packages\meteorhacks_kadira.js:3744:32)
I20181001-21:42:26.641(-4)? at Object.<anonymous> (packages\matb33_collection-hooks.js:445:81)
I20181001-21:42:26.642(-4)? at Object.collection.(anonymous function) [as update] (packages\matb33_collection-hooks.js:146:21)
I20181001-21:42:26.643(-4)? at Collection.update (packages/mongo/collection.js:580:31)
I20181001-21:42:26.643(-4)? at Collection.Mongo.Collection.(anonymous function) [as update] (packages/aldeed:collection2/collection2.js:213:19)
I20181001-21:42:26.644(-4)? at AccountsServer.Ap._expireTokens (packages/accounts-base/accounts_server.js:1129:14)
I20181001-21:42:26.645(-4)? at packages/accounts-base/accounts_server.js:1218:14
I20181001-21:42:26.645(-4)? at Meteor.EnvironmentVariable.EVp.withValue (packages\meteor.js:1189:12)
I20181001-21:42:26.646(-4)? at packages\meteor.js:505:25
I20181001-21:42:26.646(-4)? at runWithEnvironment (packages\meteor.js:1241:24)
do either of you have a readPreference=primray on your oplog URL? I added that and this problem went away - I tested it pretty rigourously in a dev environment, and just tested with a graceful primary stepdown in production. I thought readPreference=primary was the implicit default, but its possible meteor does something different when the read preference is explicitly defined
I’m not using the oplog URL. I use redis-oplog. I haven’t seen this problem in a while. But I’m pretty sure I experienced the problem while using redis-oplog so I’m not sure if it’s oplog URL related.
Is anyone still experiencing this? I just had it happen again today in production on one of our Galaxy containers. Have the following error over and over for each query on that container. Looks like it’s for all our polling queries. We don’t use the oplog. We use redis-oplog and AtlasDB and a few polling queries:
9vrwg - Exception while polling query ... MongoError: connection 97 to <server_address> timed out
9vrwg - 2019-02-27 11:15:43-08:00 at Object.Future.wait (/app/bundle/programs/server/node_modules/fibers/future.js:449:15)
9vrwg - 2019-02-27 11:15:43-08:00 at [object Object]._.extend._nextObject (packages/mongo/mongo_driver.js:1053:47)
9vrwg -2019-02-27 11:15:43-08:00 at [object Object]._.extend.forEach (packages/mongo/mongo_driver.js:1087:22)
9vrwg -2019-02-27 11:15:43-08:00 at [object Object]._.extend.getRawObjects (packages/mongo/mongo_driver.js:1136:12)
9vrwg - 2019-02-27 11:15:43-08:00 at [object Object]._.extend._pollMongo (packages/mongo/polling_observe_driver.js:153:44)
9vrwg - 2019-02-27 11:15:43-08:00 at Object.task (packages/mongo/polling_observe_driver.js:90:12)
9vrwg - 2019-02-27 11:15:43-08:00 at [object Object]._.extend._run (packages/meteor.js:811:18)
9vrwg - 2019-02-27 11:15:43-08:00 at packages/meteor.js:789:14
9vrwg - 2019-02-27 11:15:43-08:00 - - - - -
9vrwg - 2019-02-27 11:15:43-08:00 at Function.MongoError.create (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/error.js:29:11)
9vrwg - 2019-02-27 11:15:43-08:00 at TLSSocket.<anonymous> (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:186:20)
9vrwg - 2019-02-27 11:15:43-08:00 at TLSSocket.g (events.js:260:16)
9vrwg - 2019-02-27 11:15:43-08:00 at emitNone (events.js:67:13)
9vrwg - 2019-02-27 11:15:43-08:00 at TLSSocket.emit (events.js:166:7)
9vrwg - 2019-02-27 11:15:43-08:00 at TLSSocket.Socket._onTimeout (net.js:336:8)
9vrwg - 2019-02-27 11:15:43-08:00 at _runOnTimeout (timers.js:537:11)
9vrwg - 2019-02-27 11:15:43-08:00 at _makeTimerTimeout (timers.js:528:3)
9vrwg - 2019-02-27 11:15:43-08:00 at Timer.unrefTimeout (timers.js:597:5)
We just had this happen to one user twice in production; once last week and once this week. Last week we weren’t using Oplog but this week we are so it probably isn’t related to Oplog.
I am using MontiAPM and there are no errors or any indications whatsoever.
Are there any updates on why is this happening? I’m getting this behavior for a couple of methods in production env. It seems you must have an active subscription to the collection to make it happen.
I’ve spent the entire past week trying to figure this out but I did not have any successful results. As everyone pointed out, it’s just the “updated” message from DDP that does not get received. My subscription reloads properly, but the cb never fires.
We use Meteor.callPromise all over the place. I changed one of the problematic methods to the raw Meteor.call but nothing happened.
Some points:
I could not reproduce this in development neither staging env.
Takes roughly 1 hour after deploy to see the error.
Running on AWS (Atlas)
Using default oplog tailing for all subscriptions
Using the following MONGO_DB_URL: mongodb+srv://<user>:<password>@<address>.mongodb.net/<db>?retryWrites=true&w=majority
Using the following MONGO_OPLOG_URL: mongodb+srv://<oplog-user>:<password>@<address>.mongodb.net/local?retryWrites=true&w=majority
This is a difficult issue to trace since many factors are at play during production. I have a deployment running for 11 months now with pub/sub and constant use and not a single restart. So, I’m wondering what is the difference with those apps.
Are you sure both of your apps are hitting the same error path?
Is it specific to a Meteor version/update?
Is it specific to a deployment environment?
How would someone test and isolate this systematically?
Why the method call seem to stop working after considerable uptime? Is it RAM, CPU exhaustion? all threads getting blocked? Certain code path being trigger? A third party provider getting stuck? Does the server heartbeat work? It is really hard to tell.
I think the problem statement in this thread is vague. It is hard to tell if they are even caused by the same underlying problem. And it seems no one is able to reproduce it systemically.
I think the best way to isolate this is to stress it one of those apps and let it hang when this error hit and then trying to narrow down and isolate the underlying cause.
We solved our issue by converting to RedisOplog. My working theory is there is a bug in meteor oplog polling that that leads to dirty containers under heavy loads or soemthign else going on in the DB. It kept happening to us randomly… sometimes quickly and sometimes after a few days of running with no issues. If anyone gets stuck on this I recommend converting to RedisOplog, wasn’t too painful to get going.
But how did you rule out is it not CPU exhaustion under heavy load? Also, why do you think this bug doesn’t show systemically for all apps using OpLog under heavyload? What is the difference here?
We had the problem sometimes with no CPU load. We tried scaling up the containers and it still happened, maybe even worse. It’s some kind of bug in the mongo-oplog code I think, but we only ever saw it in our prod environment on AWS.