Methods "hanging" not returning

andregoldstein · December 16, 2020, 3:05pm

Hey guys.

I’m trying to get to the bottom of an ongoing issue we have with one of our apps.

At seemingly irregular intervals (sometimes every few weeks, other times up to a couple of times a day), a Meteor method will not complete.

To the end-user, this normally appears when trying to “submit” a button - an insert or update for example and then a spinner will display but the method will just seemingly “hang” - ie: no error is returned and no return value either.

If the page is manually refreshed it appears that the operation has in fact been successful but the end users are having to manually refresh the page each time.

I should say that restarting my containers (I’m currently using Scalingo PAAS) immediately solves this problem and yet there are no metrics indicating any memory or CPU spikes or issues. In fact levels are fairly comfortable.

Has anyone experienced anything similar or could help determine if this lack of reaction is most likely at the code, server or DB level?

Many thanks in advance

peterfkruger · December 16, 2020, 4:15pm

Assuming that it is known to you which single method is affected, can you give us an outline of what it does, technically, and how it is done? Are there any peculiarities, outgoing API calls or anything else out of the ordinary, everyday bread-and-butter mongodb CRUD thing? Can it be for example that it in fact returns a Promise which sometimes, due to a bug maybe, neither resolves nor rejects?

andregoldstein · December 16, 2020, 5:38pm

Hi @peterfkruger ,

Thanks for your reply. You make some good questions and I think I could have been clearer. This is happening with seemingly all methods that involve database operations, so not just on one particular method.

Also there are no third party APIs involved, just, like you say, regular MongoCrud stuff

peterfkruger · December 16, 2020, 7:05pm

Is there anything in the logs about errors thrown either on the server or on the clients? Do you use any error logger service on the client in the first place?

andregoldstein · December 16, 2020, 7:08pm

No related logs in the app logs or in client side console, no sadly

peterfkruger · December 16, 2020, 7:45pm

I urge you to introduce error logging on your app’s client side. Right now you simply can’t know what’s going on on the client in terms of stability, often thwarted by the strangest errors on various devices and browsers. Also highly recommended to review the usage of react error boundary in your app (assuming you use react). If not set up correctly, sometimes even a trivial error can disable your entire application.

This suggests that the error is on the client side; when you restart your containers, your clients get restarted too. My guess is that your client apps keep running for a while after the restart until more and more of them run into the undetected error and stop working again.

EDIT: sorry, I misread your last anwer. Are you saying that you have client side error logging and nothing is logged there either?

andregoldstein · December 16, 2020, 9:25pm

Thanks again for a really helpful answer. I’m not using any particular client side logging, would you be able to advise on any solution in particular. So far when the problem has arisen, I have been informed and I can replicate the problem on my own connection with no errors appearing in the browser console.

It’s also triggering a problem for all users, my understanding is if it were a client-side issue it would only affect the user(s) who has triggered that particular bug/issue?

Thanks for your continued, helpful advice

EDIT: Edit to say that I’ve now set up an instance of this app on a different provider (NodeChef) to see if the configuration makes a difference. This setup also includes Meteor APM, not sure if that will reveal any potential problems

peterfkruger · December 16, 2020, 10:16pm

That’s good; if I’m not mistaken, Meteor APM also comes with a built-in facility of error logging. Let’s hope that this setup turns up errors either on the server or on the client.

Other than Meteor APM’s own error logging there are plenty of frameworks available, just google “javascript error logging”. Some are for free, others are paid but have a free tier, others are paid only. I however can’t advise you on which one is good, unfortunately.

I have my own ideas of error flow, which go far beyond the mere catching and reporting of all errors, and comprise the collection of relevant context data of the respective error situation. Whereas all frameworks I am aware of only ever deal with catching, transporting and displaying the mere errors caught.

We don’t know yet; it may just as well be that at some point one of the methods returns some data, either correctly or in error, that, once that happens, causes an error on the client, be it even a very trivial one, which knocks out all clients one by one once they get there.

Can’t it be that a piece of data that all or most of your clients get to load sooner or later via one of your methods got corrupted somehow, and this returned corrupt data is knocking off the clients? Then when you restart your containers, it takes a while for each client to load and process that piece of data again.

andregoldstein · December 17, 2020, 8:14am

Thanks Peter, hopefully the APM will turn up something for sure. You’ve given me some really good ideas for trying to track this down further so I’ll try and debug further!

I’m not aware of much shared local state between multiple users that would be rest on page refresh that would be causing this issue hence why I was wondering whether it could ba a database CPU/RAM issue.

I’ll keep digging though, thanks once again!

andregoldstein · December 17, 2020, 8:20am

Meteor APM does appear to be tracking client errors and exceptions which is really helpful.

The only noticeable one at the moment is

Error: Clock discrepancy detected. Attempting re-sync

Which I’ll admit to having seen a few times in the console but not been sure if it is really an issue or not?

peterfkruger · December 17, 2020, 8:47am

I’ve seen that message quite often too and I don’t think that’s an issue.

bartoftutormundi · January 13, 2021, 11:59am

Hi Andre, I haven’t had this but a friend of mine once described a similar issue to me, he sent me these links below.

I believe in the end he never found the root cause but fixed (or rather, avoided) it by using Meteor.apply instead of Meteor.call

github.com/meteor/meteor

Method Callbacks not being called intermittently

opened 07:36PM - 04 Oct 16 UTC

closed 06:28PM - 19 Oct 17 UTC

ppotoplyak

In the following the client calls server method `recordPing(counter)` every 3 se…cs, with `counter` being incremented by one in the method callback. Client: ``` var pingCount = 0; function heartbeat() { console.log('->recordPing pingCount: ' + pingCount); Meteor.call('recordPing', pingCount, function(err) { if(!angular.isUndefined(err)) alert(err); console.log('<-recordPing pingCount: ' + pingCount); pingCount = pingCount + 1; }); heartbeatTimeout = setTimeout(heartbeat, 3000); } heartbeat(); ``` Server: ``` Meteor.methods({ recordPing: function(pingCounter) { console.log('->recordPing pingCounter : ' + pingCounter + (this.connection != null ? ', called from client' : '')); } }); ``` On Safari/iPhone5 I am intermittently seeing multiple `recordPing` invocations with the _same_ counter value. This could only happen if the method return callback was being delayed. Here is the server log showing a period where the counter is not incremented: ![serverlog](https://cloud.githubusercontent.com/assets/4952152/19088044/09b51930-8a3a-11e6-80d7-752754b993fe.png) The client console confirms that the callback was delayed for 80+ seconds. The client made a call to `pingCounter(111)` at 13:07:25 and executed a callback at 13:08:46. ![clientdelay](https://cloud.githubusercontent.com/assets/4952152/19088819/8cd061b4-8a3d-11e6-915a-33016b78531b.png) Setup info: ``` ppotoplyak@bigscreen:/pit/PixelUprise2$ meteor --version Meteor 1.4.1.1 ppotoplyak@bigscreen:/pit/PixelUprise2$ meteor list angular-with-blaze 1.3.11 Everything you need to use both AngularJS and Blaze templates in your Meteor app angular:angular-cookies 1.5.3_1 AngularJS (official) release. For full solution: http://angular-meteor.com/ angular:angular-sanitize 1.5.3_1 AngularJS (official) release. For full solution: http://angular-meteor.com/ angularui:angular-ui-bootstrap 0.13.0 Native AngularJS (Angular) directives for Bootstrap. angularui:angular-ui-router 0.2.15 angular-ui-router (official): Flexible routing with nested views in AngularJS chrismbeckett:toastr 2.1.2_1 Gnome / Growl type non-blocking notifications cordova:cordova-google-play-services 25.0.0 cordova:cordova-plugin-device 1.1.1 cordova:cordova-plugin-underdevelopment file://.cordova-sv-plugin/ cordova:cordova-plugin-vibration 2.1.1 es5-shim 4.6.14_1 Shims and polyfills to improve ECMAScript 5 support fourseven:scss 3.10.0 Style with attitude. Sass and SCSS support for Meteor.js. http 1.2.9_1 Make HTTP calls to remote servers jquery 1.11.9 Manipulate the DOM using CSS selectors materialize:materialize 0.97.6* Materialize (official): A modern responsive front-end framework based on Material Design meteor-base 1.0.4 Packages that every Meteor app needs mobile-experience 1.0.4 Packages for a great mobile user experience mongo 1.1.12_1 Adaptor for using MongoDB and Minimongo over DDP mrt:modernizr-meteor 2.6.2 Modernizr repackaged for Meteor natestrauser:animate-css 3.5.1 Animate.css packaged for meteor session 1.1.6 Session variable shell-server 0.2.1 Server-side component of the `meteor shell` command. standard-minifier-css 1.2.1 Standard css minifier used with Meteor apps by default. standard-minifier-js 1.2.0_1 Standard javascript minifiers used with Meteor apps by default. stevermeister:angular-flipclock 0.1.1_4 AngularJS wrapper for FlipClock.js tracker 1.1.0 Dependency tracker to allow reactive callbacks twbs:bootstrap 3.3.6 The most popular front-end framework for developing responsive, mobile first projects on the web. ``` I tried [SO](http://stackoverflow.com/questions/39709875/how-to-root-cause-observechanges-latency-spikes) & [Meteor Forums](https://forums.meteor.com/t/does-meteor-have-a-make-check-or-some-other-long-running-tests/29920) while reducing the issue.

github.com/meteor/meteor

Oplog tailing stalls (usually during MongoDB failover) (maybe primarily with Compose Classic)

opened 09:18PM - 13 Apr 17 UTC

closed 11:29PM - 11 Oct 17 UTC

veered

confirmed Project:Mongo Driver Type:Bug Severity:production Impact:most

Using the `replicaSet=...` option in `MONGO_URL` or `MONGO_OPLOG_URL` causes aut…o-reconnects to be disabled on that connection. Since including this parameter is the officially recommended approach, most people using Meteor in prod don't have auto-reconnecting DB connections. I am using Meteor 1.4.3.2, but I think this bug has pretty much always existed. The bug is very easy to reproduce. Locally set `MONGO_URL` to a mongo url that includes `replicaSet=...`. Open up the meteor shell and do a find on a collection; the query should successfully complete. Now reset the network connection on your computer. Try the query again; the query should hang and eventually fail. Now try this same series of steps after removing the `replicaSet` parameter from `MONGO_URL`; notice that the query won't fail. The steps for replicating the problem for `MONGO_OPLOG_URL` is exactly the same. Note: If removing the `replicaSet` parameter crashes your server, then make sure there is only a single host referenced in `MONGO_URL` and that this host is the primary. The main thing going on here is that by passing in the `replSet` parameter the Mongo driver uses the `replset` connection type rather than the `server` connection type. As you can see [here](https://github.com/mongodb/node-mongodb-native/blob/2.2/lib/replset.js#L135) `replset` connections ignore the auto-reconnect property, but `server` connection types [do not](https://github.com/mongodb/node-mongodb-native/blob/2.2/lib/server.js#L129). You can view info about the `MONGO_URL` connection by typing `Meteor.users._driver.mongo.db.serverConfig` into the meteor shell, and info about `MONGO_OPLOG_URL` by typing `Meteor.users._driver.mongo._oplogHandle._oplogTailConnection.db.serverConfig` into the meteor shell. This may be due to a relatively recent change in the mongo driver that happened on June 14th, 2016. [Here is the commit](https://github.com/mongodb/node-mongodb-native/commit/412b50d53dc8e958d3b726fc7f66fe0447b9da66#diff-3995a38a37cf6b7ad44d4c8e63893046R131). But it is also possible that auto-reconnect was never supported for `replset` connections, and that commit is just refactoring the code to reflect that fact. Hard to tell exactly. In any case, it isn't supported anymore. Seems like this problem is related to a bunch of issues people have been having, #5773 being the most popular. Not really sure the best way to fix this. For now I've just dropped the `replicaSet` parameter and made sure that each connection URL only references the primary host, not the secondary. What is the purpose of passing in info about the secondary node (which is the only thing that is forcing us to use the `replicaSet` parameter)? It doesn't really seem to be used for anything except for failover support, and Meteor can't handle failover. I guess if you keep both hosts in the connection URL, Meteor will connect to the new primary upon restart. The only solutions I can think of are: (1) Stop passing in the `replicaSet` parameter (2) Patch the mongo driver so that `replset` connections support auto-reconnects. (3) Handle reconnects at the Meteor level Anyways, this is a pretty important bug. We had some serious outages over the past week because Compose reset all db connections a few times while debugging something unrelated on their end.

andregoldstein · January 13, 2021, 5:57pm

Thanks for the info! I’ll take a look and see if I can find any common ground. Really appreciate the post!