Meteor “login” method never invokes callback, blocks all future RPC calls

The issue

I’m encountering an issue that pops up semi-frequently on some of our production Meteor app servers where method invocations queue up but never run, despite successful ping/pong DDP messages (and subscriptions) sending.

What I know so far

It’s extremely difficult to debug since it’s a minified production bundle, but here’s what I’ve found:

Opening Chrome console and calling methods works as expected (I receive onResultReceived callback and normal asynchronous callback). For example:

Meteor.apply('someMethod', [], {
    onResultReceived: function() {
    console.log('results received');
  }
}, function(e, r) {
  console.log(e);
  console.log(r);
});

Properly logs:
results received
null
someresult

If I check my DDP connection’s outstanding method blocks (Meteor.connection._outstandingMethodBlocks), it properly returns an empty array.

Where it breaks

The issue begins though, once I try to call a login method. To narrow down the issue, I call the Meteor login method by using apply, like so:

var loginArgs = [{
  user: {email: 'user@site.com'},
  password: Accounts._hashPassword('secretpassword')
}];
var options = {};
options.onResultReceived = function() { console.log(arguments); };
Meteor.apply('login', loginArgs, options, function (err, result) {
  console.log('--')
  console.log(err);
  console.log('--')
  console.log(result);
});

Once this is run, the onResultReceived fires and returns properly with my userId, token and tokenExpires. The websocket message is visible in my devtools network tab. But for some reason, the callback is never fired.

If I check Meteor.connection._outstandingMethodBlocks at this point, there’s still an object in it from my login method call, but the method block methods array is empty.

If I inspect Meteor.connection._methodInvokers, I can still see the MethodInvoker there with the following properties:
enter image description here

This completely breaks the client-side app and prevents any future Meteor methods from being invoked, despite the fact that I can still receive downstream messages from websockets (e.g. subscription data).

Conclusions so far

The only conclusion I can draw so far is that the invoker._dataVisible is never set to true, which means that the callback is never invoked. See the “_maybeInvokeCallback” function check for _dataIsVisible boolean.

Additional notes

I can force the callback to eventually run by:

  1. CallingMeteor.connection._stream.reconnect({_force: true});
    or
  2. Copying the websocket message from my chrome network tab that has the “msg: result” and pasing it into Meteor.connection._livedata_connected(msg) as the msg parameter

My question

What would prevent calling the RPC login from ever invoking a callback? How can I further get to the bottom of this?

2 Likes

Update: We can re-create the issue locally if we do a mongo db restore like: mongorestore --drop --host 127.0.0.1 --port 3001 -d meteor ./dump/app-database while the Meteor app is running.

This PR may help: https://github.com/meteor/meteor/pull/9739

Would you be able to test a beta version of Meteor 1.6.2 with that PR included? If it’s too much trouble to update your app, we could also publish a special release version based on whatever version of Meteor you’re currently using.

4 Likes

For sure, we’re in the midst of updating to 1.1.6. Once we get that live we’ll try it out and monitor for a few days. The issue usually happens ~2 times a week so I’ll update here (and on the PR) once we’ve had it live for 1-2 weeks. Thanks Ben!

@benjamn what’s the best way to test this? Would it be adding mongo package at the version Glasser specified:

meteor add mongo@1.4.6-oplogrestart.0

Or is there another way?

I can confirm that this issue is still active running Meteor 1.7.0.3.

The issue has really come from nowhere. We aren’t sure what is actually causing it but following @typ “Where it breaks” section reigns true. The login call goes out and a login token is received but the “update” for the ddp connection is never received so callbacks never fire. After resetting the socket connection everything runs fine again.

This only happens when trying to login and it doesn’t happen on all of our nodes either. Its extremely odd, because the issue seems to jump from node to node after restarting them. It’s nearly impossible to tell whats really going on when we don’t see any errors client or server side.

We are more than happy to try to provide insight into this issue, but we aren’t sure where to begin to provide useful information for this issue.

1 Like

Funny timing here. We have been experiencing this still (but rarely), however one of our QA engineers just ran into an issue with an end-to-end test which can reproduce the issue in our app 100% of the time.

The test attempt to log the user out (simply calls an event handler which calls Meteor.logout but the logout callback is never called + user is still logged in). If I open up the console and check Meteor.connection._outstandingMethodBlocks, it’s got a bunch of backed up methods.

Here’s where things get weird…we switched off of Meteor Oplog a few weeks back and now use Redis Oplog (https://github.com/cult-of-coders/redis-oplog) which leads me to believe the issue isn’t oplog related. We’ve tried this with mongo@1.4.6 and the issue still happens.

@benjamn given we have a 100% reproducible scenario, albeit private repo, do you have any tips on what information we can surface in order to debug this and find root cause?

2 Likes

Just ran into this again but this time on a local enviroment. @benjamn, any guidance on this would be much appreciated. We want to help get this fix as this has already effected our production environment and as @typ said he has a reproducible scenario.

I have a similar (though not identical) problem - mine is for sure related to a primary failover in production. Some clients and specific users of those clients (e.g., logging into a browser exhibiting the problem as user A may work, but user B wont). We see this pretty much exclusively after a primary failover, but intermittently across apps and servers.

Where my problem differs is that I dont get ANY response from either login or logout calls (I can monitor the DDP connection), but are able to receive data from subscriptions and other method calls.

@znewsham @zcompton
What version of Meteor are you on? We were working to update to 1.7 but got held up by a dependency issue with meteor-mocha. Now that the issue is resolved we plan to update in the next few days. I’m wondering if this happens for 1.7 or later.

@znewsham @zcompton I’m also curious what overlapping packages/hosting providers we all have. I figured this would be much more widespread if it happened to everyone using Meteor.

Our hosting provider for Mongo is Atlas on AWS.

@typ we’re on 1.6 and also blocked due to package dependencies. We’re using AWS for mongo and Meteor - my guess is that primary failovers don’t happen often, and thats why we dont see it more

@typ 1.7.0.3 here and hosted on Azure

@zcompton @znewsham would you guys mind sharing a list of packages used? Curious to see overlap amongst us all.

Heres a paste bin of our Versions file:

https://pastebin.com/5T6sSdgN

We have a lot of older packages that we are dependent on.

@typ

https://pastebin.com/pBHnb819

I’ve had this error too in production. Running Meteor 1.5.1 on Mongo Atlas running on Galaxy. We also use redis-oplog so oplog is disabled. Below is our versions file:

https://pastebin.com/WdYtaTiz

So I ran into this login issue again this morning and a few days ago. Both times on a local server and I had received MongoNetworkErrors before the issue started happening.

Here is the call stack:

I20181001-21:42:26.563(-4)? Exception in setInterval callback: MongoNetworkError: connection 67 to xxxxx.servers.xxxxxxxxx.com:27017 timed out
I20181001-21:42:26.587(-4)?     at TLSSocket.<anonymous> (C:\Users\######\AppData\Local\.meteor\packages\npm-mongo\3.0.7\npm\node_modules\mongodb-core\lib\connection\connection.js:258:7)
I20181001-21:42:26.587(-4)?     at Object.onceWrapper (events.js:313:30)
I20181001-21:42:26.587(-4)?     at emitNone (events.js:106:13)
I20181001-21:42:26.588(-4)?     at TLSSocket.emit (events.js:208:7)
I20181001-21:42:26.588(-4)?     at TLSSocket.Socket._onTimeout (net.js:410:8)
I20181001-21:42:26.588(-4)?     at ontimeout (timers.js:498:11)
I20181001-21:42:26.588(-4)?     at tryOnTimeout (timers.js:323:5)
I20181001-21:42:26.588(-4)?     at Timer.listOnTimeout (timers.js:290:5)
I20181001-21:42:26.617(-4)?  => awaited here:
I20181001-21:42:26.618(-4)?     at Function.Promise.await (C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\promise_server.js:56:12)
I20181001-21:42:26.619(-4)?     at Promise.asyncApply (packages/mongo/mongo_driver.js:1034:11)
I20181001-21:42:26.620(-4)?     at C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\fiber_pool.js:43:40
I20181001-21:42:26.621(-4)?  => awaited here:
I20181001-21:42:26.622(-4)?     at Promise.await (C:\Users\######\AppData\Local\.meteor\packages\promise\0.11.1\npm\node_modules\meteor-promise\promise_server.js:60:12)
I20181001-21:42:26.623(-4)?     at SynchronousCursor._nextObject (packages/mongo/mongo_driver.js:1083:12)
I20181001-21:42:26.623(-4)?     at SynchronousCursor.forEach (packages/mongo/mongo_driver.js:1097:22)
I20181001-21:42:26.637(-4)?     at SynchronousCursor.map (packages/mongo/mongo_driver.js:1107:10)
I20181001-21:42:26.637(-4)?     at SynchronousCursor.fetch (packages/mongo/mongo_driver.js:1131:17)
I20181001-21:42:26.640(-4)?     at Cursor.(anonymous function) (packages/mongo/mongo_driver.js:879:44)
I20181001-21:42:26.640(-4)?     at Cursor.cursorProto.(anonymous function) (packages\meteorhacks_kadira.js:3352:32)
I20181001-21:42:26.641(-4)?     at Cursor.kadira_Cursor_fetch [as fetch] (packages\meteorhacks_kadira.js:3744:32)
I20181001-21:42:26.641(-4)?     at Object.<anonymous> (packages\matb33_collection-hooks.js:445:81)
I20181001-21:42:26.642(-4)?     at Object.collection.(anonymous function) [as update] (packages\matb33_collection-hooks.js:146:21)
I20181001-21:42:26.643(-4)?     at Collection.update (packages/mongo/collection.js:580:31)
I20181001-21:42:26.643(-4)?     at Collection.Mongo.Collection.(anonymous function) [as update] (packages/aldeed:collection2/collection2.js:213:19)
I20181001-21:42:26.644(-4)?     at AccountsServer.Ap._expireTokens (packages/accounts-base/accounts_server.js:1129:14)
I20181001-21:42:26.645(-4)?     at packages/accounts-base/accounts_server.js:1218:14
I20181001-21:42:26.645(-4)?     at Meteor.EnvironmentVariable.EVp.withValue (packages\meteor.js:1189:12)
I20181001-21:42:26.646(-4)?     at packages\meteor.js:505:25
I20181001-21:42:26.646(-4)?     at runWithEnvironment (packages\meteor.js:1241:24)

I too am having issues - for me this is the problem:

@evolross @zcompton

do either of you have a readPreference=primray on your oplog URL? I added that and this problem went away - I tested it pretty rigourously in a dev environment, and just tested with a graceful primary stepdown in production. I thought readPreference=primary was the implicit default, but its possible meteor does something different when the read preference is explicitly defined