Changes not being set from server to client

I have a super-odd situation where my app (Meteor 1.7.0.5) will behave properly up until the first ping/pong and then things get strange. I have a meteor method called updateRecord which sends off a document update which I can see is correctly applied to the remote database. The UI re-renders locally with the optimistic update but then gets rolled back after the DDP updated message comes back. The difference between when this works (for a while, after a total app restart) and when it does not is that the DDP changed message does not get sent in the non-working case.

The same can be observed if I manually make changes to the database using the meteor mongo CLI. On a fresh app restart I see DDP changed messages when I update documents. Again, after the first ping/pong sequence these manually changes no longer get propagated to the client.

Hoping someone can direct me out of this 3-day hole!

What’s the state of the database after this - has that been rolled back as well?

Is this maybe a simple schema action or collection hook which you’ve triggered?

@robfallows the state of the remote (server) data is that it contains my change. The state of minimongo is that is does not have the change - understandable since the changed DDP message never came.

Hmm. Can we see the contents of .meteor/packages?

Sure thing!

# Meteor packages used by this project, one per line.
#
# 'meteor add' and 'meteor remove' will edit this file for you,
# but you can also edit it by hand.
meteor-base@1.4.0             # Packages every Meteor app needs to have
mongo@1.5.0                   # The database Meteor supports right now
blaze-html-templates@1.0.4 # Compile .html files into Meteor Blaze views
reactive-var@1.0.11            # Reactive variable for tracker
tracker@1.2.0                 # Meteor's client-side reactive programming library

standard-minifier-css@1.4.1   # CSS minifier run for production mode
es5-shim@4.8.0                # ECMAScript 5 compatibility for older browsers.
ecmascript@0.11.1              # Enable ECMAScript2015+ syntax in app code
shell-server@0.3.1            # Server-side component of the `meteor shell` command


alanning:roles
accounts-password@1.5.1
accounts-ui@1.3.0
lepozepo:accounting
mizzao:bootboxjs
mrt:flash-messages-plus
raix:handlebar-helpers
mrt:mask

numeral:numeral
risul:moment-timezone
eskan:chosen
mizzao:user-status
codechimera:meteor-bootstrap-sweetalert
audit-argument-checks@1.0.7
check@1.3.1
session@1.1.7
jquery@1.11.10
logging@1.1.20
reload@1.2.0
random@1.1.0
ejson@1.1.0
spacebars
dburles:collection-helpers
reywood:publish-composite
react-template-helper
react-meteor-data
gadicc:blaze-react-component
kurounin:pagination-blaze
kurounin:pagination
tmeasday:publish-counts
dynamic-import@0.4.2
fongandrew:find-and-modify
underscore@1.0.10

You mentioned “remote database” - is this in production and if it is, what version of MongoDB?

That was a bit oddly termed I give you that :slight_smile: No, this is the locally running remote database in dev mode so currently 3.6.4. I’ve had this issue reported in production too though.

I note my mongo package is a little out of date. I could try updating it but I don’t generally like resolving these whiffy things with speculative library changes and no definitive root cause.

You have the correct versions of packages for 1.7.0.5 (at least as far as Meteor core is concerned). I definitely wouldn’t be speculatively updating mongo, for example.

However, I am wondering if this was built originally as a 1.7.0.5 project, or if it’s as a result of updating from an older version. There have been anecdotal reports of issues with the database itself on recent updates.

Oh, indeed, this is an ancient project that’s been upgraded many many times. As I pointed out in the beginning, I’ve been able to pinpoint this to post first timeout (ping/pong) sequence. And it’s reproducible with the meteor mongo CLI too.

This is acting like there’s an issue with oplog tailing. There were some significant changes in Meteor 1.7 and 1.7.0.1 wrt MongoDB. If you’re still using the same pre-MongoDB-3.6.4 database, it’s possible there have been changes which aren’t accommodated in the latest driver and/or oplog tailing.

Have you tried creating a brand new 1.7.0.5 project, copying the source over (i.e. not .meteor/local in particular) and using mongodump and mongorestore to ensure your new database is populated with your original data?

I haven’t, but will. Doesn’t explain why the issue would be seen in production too though does it?

@robfallows, I deleted my .meteor/local and performed a meteor reset for good measure. I deleted the package-lock.json and the node_modules dir. I then restored the database. The issue still exists. Oh, a meteor upgrade gifted me with 1.8 as a bonus - which I didn’t think was released yet but whatever.

Incidentally, why would oplog tailing be an issue for local development? There’s no replica set for local development and there’s no oplog without a replica set, right?

The local development database includes a replica set specifically to allow oplog tailing.

Got it. Thanks @robfallows. Well, things are still far from awesome this morning. Same issue on 1.8. I’m going to try copying the source as you originally suggested but I don’t like the implications that has for my git history one little bit. Following that I’m going spelunking in meteor internals in the hope of getting a better understanding of what might be happening post-timeout and perhaps even propose a fix if I get that far.

You mentioned anecdotal reports of issues - do you have any links to those?

There are some in this forum. However, I don’t think any I’ve read match what you’re seeing. A couple of examples:

After going down another rabbit hole, I was reminded of this.

Any chance you’ve hit that?

I don’t think so. Doesn’t feel like the same issue. I can see my changes in the oplog itself so it seems that something is broken with oplog observation. Trying to figure out the best way of debugging this. I wish meteor’s internal libs weren’t so buried in the filesystem.

This issue has 24 hours then it becomes the final push I need to migrate to GraphQL anyway…