Meteor “login” method never invokes callback, blocks all future RPC calls

@zcompton @znewsham would you guys mind sharing a list of packages used? Curious to see overlap amongst us all.

Heres a paste bin of our Versions file:

https://pastebin.com/5T6sSdgN

We have a lot of older packages that we are dependent on.

@typ

https://pastebin.com/pBHnb819

I’ve had this error too in production. Running Meteor 1.5.1 on Mongo Atlas running on Galaxy. We also use redis-oplog so oplog is disabled. Below is our versions file:

https://pastebin.com/WdYtaTiz

So I ran into this login issue again this morning and a few days ago. Both times on a local server and I had received MongoNetworkErrors before the issue started happening.

Here is the call stack:

I20181001-21:42:26.563(-4)? Exception in setInterval callback: MongoNetworkError: connection 67 to xxxxx.servers.xxxxxxxxx.com:27017 timed out
I20181001-21:42:26.587(-4)?     at TLSSocket.<anonymous> (C:\Users\######\AppData\Local\.meteor\packages\npm-mongo\3.0.7\npm\node_modules\mongodb-core\lib\connection\connection.js:258:7)
I20181001-21:42:26.587(-4)?     at Object.onceWrapper (events.js:313:30)
I20181001-21:42:26.587(-4)?     at emitNone (events.js:106:13)
I20181001-21:42:26.588(-4)?     at TLSSocket.emit (events.js:208:7)
I20181001-21:42:26.588(-4)?     at TLSSocket.Socket._onTimeout (net.js:410:8)
I20181001-21:42:26.588(-4)?     at ontimeout (timers.js:498:11)
I20181001-21:42:26.588(-4)?     at tryOnTimeout (timers.js:323:5)
I20181001-21:42:26.588(-4)?     at Timer.listOnTimeout (timers.js:290:5)
I20181001-21:42:26.617(-4)?  => awaited here:
I20181001-21:42:26.618(-4)?     at Function.Promise.await (C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\promise_server.js:56:12)
I20181001-21:42:26.619(-4)?     at Promise.asyncApply (packages/mongo/mongo_driver.js:1034:11)
I20181001-21:42:26.620(-4)?     at C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\fiber_pool.js:43:40
I20181001-21:42:26.621(-4)?  => awaited here:
I20181001-21:42:26.622(-4)?     at Promise.await (C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\promise_server.js:60:12)
I20181001-21:42:26.623(-4)?     at SynchronousCursor._nextObject (packages/mongo/mongo_driver.js:1083:12)
I20181001-21:42:26.623(-4)?     at SynchronousCursor.forEach (packages/mongo/mongo_driver.js:1097:22)
I20181001-21:42:26.637(-4)?     at SynchronousCursor.map (packages/mongo/mongo_driver.js:1107:10)
I20181001-21:42:26.637(-4)?     at SynchronousCursor.fetch (packages/mongo/mongo_driver.js:1131:17)
I20181001-21:42:26.640(-4)?     at Cursor.(anonymous function) (packages/mongo/mongo_driver.js:879:44)
I20181001-21:42:26.640(-4)?     at Cursor.cursorProto.(anonymous function) (packages\meteorhacks_kadira.js:3352:32)
I20181001-21:42:26.641(-4)?     at Cursor.kadira_Cursor_fetch [as fetch] (packages\meteorhacks_kadira.js:3744:32)
I20181001-21:42:26.641(-4)?     at Object.<anonymous> (packages\matb33_collection-hooks.js:445:81)
I20181001-21:42:26.642(-4)?     at Object.collection.(anonymous function) [as update] (packages\matb33_collection-hooks.js:146:21)
I20181001-21:42:26.643(-4)?     at Collection.update (packages/mongo/collection.js:580:31)
I20181001-21:42:26.643(-4)?     at Collection.Mongo.Collection.(anonymous function) [as update] (packages/aldeed:collection2/collection2.js:213:19)
I20181001-21:42:26.644(-4)?     at AccountsServer.Ap._expireTokens (packages/accounts-base/accounts_server.js:1129:14)
I20181001-21:42:26.645(-4)?     at packages/accounts-base/accounts_server.js:1218:14
I20181001-21:42:26.645(-4)?     at Meteor.EnvironmentVariable.EVp.withValue (packages\meteor.js:1189:12)
I20181001-21:42:26.646(-4)?     at packages\meteor.js:505:25
I20181001-21:42:26.646(-4)?     at runWithEnvironment (packages\meteor.js:1241:24)

I too am having issues - for me this is the problem:

@evolross @zcompton

do either of you have a readPreference=primray on your oplog URL? I added that and this problem went away - I tested it pretty rigourously in a dev environment, and just tested with a graceful primary stepdown in production. I thought readPreference=primary was the implicit default, but its possible meteor does something different when the read preference is explicitly defined

I’m not using the oplog URL. I use redis-oplog. I haven’t seen this problem in a while. But I’m pretty sure I experienced the problem while using redis-oplog so I’m not sure if it’s oplog URL related.

We experience the problem even with using redis-oplog, so it’s not url related.

We do not specify any read preference in our MongoURLs which means it should default to primary.

Is anyone still experiencing this? I just had it happen again today in production on one of our Galaxy containers. Have the following error over and over for each query on that container. Looks like it’s for all our polling queries. We don’t use the oplog. We use redis-oplog and AtlasDB and a few polling queries:

9vrwg - Exception while polling query ... MongoError: connection 97 to <server_address> timed out
9vrwg - 2019-02-27 11:15:43-08:00 at Object.Future.wait (/app/bundle/programs/server/node_modules/fibers/future.js:449:15)
9vrwg - 2019-02-27 11:15:43-08:00 at [object Object]._.extend._nextObject (packages/mongo/mongo_driver.js:1053:47)
9vrwg -2019-02-27 11:15:43-08:00 at [object Object]._.extend.forEach (packages/mongo/mongo_driver.js:1087:22)
9vrwg -2019-02-27 11:15:43-08:00 at [object Object]._.extend.getRawObjects (packages/mongo/mongo_driver.js:1136:12)
9vrwg - 2019-02-27 11:15:43-08:00 at [object Object]._.extend._pollMongo (packages/mongo/polling_observe_driver.js:153:44)
9vrwg - 2019-02-27 11:15:43-08:00 at Object.task (packages/mongo/polling_observe_driver.js:90:12)
9vrwg - 2019-02-27 11:15:43-08:00 at [object Object]._.extend._run (packages/meteor.js:811:18)
9vrwg - 2019-02-27 11:15:43-08:00 at packages/meteor.js:789:14
9vrwg - 2019-02-27 11:15:43-08:00 - - - - -
9vrwg - 2019-02-27 11:15:43-08:00 at Function.MongoError.create (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/error.js:29:11)
9vrwg - 2019-02-27 11:15:43-08:00 at TLSSocket.<anonymous> (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:186:20)
9vrwg - 2019-02-27 11:15:43-08:00 at TLSSocket.g (events.js:260:16)
9vrwg - 2019-02-27 11:15:43-08:00 at emitNone (events.js:67:13)
9vrwg - 2019-02-27 11:15:43-08:00 at TLSSocket.emit (events.js:166:7)
9vrwg - 2019-02-27 11:15:43-08:00 at TLSSocket.Socket._onTimeout (net.js:336:8)
9vrwg - 2019-02-27 11:15:43-08:00 at _runOnTimeout (timers.js:537:11)
9vrwg - 2019-02-27 11:15:43-08:00 at _makeTimerTimeout (timers.js:528:3)
9vrwg - 2019-02-27 11:15:43-08:00 at Timer.unrefTimeout (timers.js:597:5)

Does anyone have any leads on this?

We just had this happen to one user twice in production; once last week and once this week. Last week we weren’t using Oplog but this week we are so it probably isn’t related to Oplog.

I am using MontiAPM and there are no errors or any indications whatsoever.

Are there any updates on why is this happening? I’m getting this behavior for a couple of methods in production env. It seems you must have an active subscription to the collection to make it happen.

I’ve spent the entire past week trying to figure this out but I did not have any successful results. As everyone pointed out, it’s just the “updated” message from DDP that does not get received. My subscription reloads properly, but the cb never fires.

We use Meteor.callPromise all over the place. I changed one of the problematic methods to the raw Meteor.call but nothing happened.

Some points:

  • I could not reproduce this in development neither staging env.
  • Takes roughly 1 hour after deploy to see the error.
  • Running on AWS (Atlas)
  • Using default oplog tailing for all subscriptions
  • Using the following MONGO_DB_URL: mongodb+srv://<user>:<password>@<address>.mongodb.net/<db>?retryWrites=true&w=majority
  • Using the following MONGO_OPLOG_URL: mongodb+srv://<oplog-user>:<password>@<address>.mongodb.net/local?retryWrites=true&w=majority
  • Meteor version: 1.9
  • Mongo version: 3.6.17

I switched to redis oplog, and the problems went away after this - not sure if it is directly related.

has anyone solved this issue? We’re having it intermittently in production

We have the same problem, any help would be highly appreciated

I wonder if this issue is related in any way to:

  • number of registered users
  • the same issue in this thread: Slow mongo query on createUser
  • if this happens on login with email, with username with a certain social provider. Should we do a polling?

This still happens in the respective apps of both @andregoldstein and @waldgeist. We have found no resolution quite yet.

  • In both cases the error occurs exclusively in production and only after several days of server uptime.
  • It doesn’t seem to be related to login in either of these apps.
  • There is some indication in one of the apps that having switched to redis oplog solved the problem, but it’s too early to tell for sure.

This is a difficult issue to trace since many factors are at play during production. I have a deployment running for 11 months now with pub/sub and constant use and not a single restart. So, I’m wondering what is the difference with those apps.

  1. Are you sure both of your apps are hitting the same error path?
  2. Is it specific to a Meteor version/update?
  3. Is it specific to a deployment environment?
  4. How would someone test and isolate this systematically?
  5. Why the method call seem to stop working after considerable uptime? Is it RAM, CPU exhaustion? all threads getting blocked? Certain code path being trigger? A third party provider getting stuck? Does the server heartbeat work? It is really hard to tell.

I think the problem statement in this thread is vague. It is hard to tell if they are even caused by the same underlying problem. And it seems no one is able to reproduce it systemically.

I think the best way to isolate this is to stress it one of those apps and let it hang when this error hit and then trying to narrow down and isolate the underlying cause.

1 Like

BTW I think the same issue is discussed on GitHub: Meteor logged in but Meteor.loggingIn() stuck on true / initial login method works but subsequent methods never complete · Issue #11323 · meteor/meteor · GitHub