Publication contained document that had been delete from database [SOLVED]

Today we experienced something weird. We have a publication on a certain collection. It is a dynamic publication, users call the publication function with a parameter. That parameter is the same for a group of users, so at the same time a group of logged in users will probable share the same publication cursor.

After one user deleted a document from the collection, the document was not removed from the publication and remained visible for the all users. Checking the database remotely however showed that the document really had been deleted from the database.

I checked the application several times in the next two hours, using different users logging in and out, but the deleted document remained visible, also for new users logging in.

Then after three hours, the document suddenly disappeared from the publication.

I donā€™t understand in first why the publication did not update immediately. How is the publication triggered to rerun the query and what could possibly caused it not doing this? Any ideas?

Note:
I tried the same workflow on a copy of the application running on another galaxy container looking at another database instance and things worked wel there. The publication updated immediately after removal of the document.

Please paste in sample of your publication

Meteor.publish('regulationitems', function (configurationId: string) {
  check(configurationId, String);
  if (hasConfigurationAccess(this.userId, configurationId)) {
    return RegulationItems.find({configurationId});
  } else {
    return null;
  }
});

Are you including the configurationId in your client side mongo query as well or just doing a find({}) and relying on minimongo clearing the cache? The latter approach I would not recommend, because you cannot be certain about timing.

could you explain more exactly what you mean with ā€˜timingā€™ in this context?

By timing I mean when exactly that document will be removed from client side minimongo. At least that has been my impression.

I saw the following:

  • some user removed a document from the collection.

  • I checked the database: document is indeed gone, although I still see it in the interface. I check the collection minimongo (through chrome devtools: document is still there.

  • I log out from the application: I check the content of the minomongo collection in chrome dev tools: collection is empty.

  • I log in in again the application and see that the collection in minimongo becoming filled: and it includes the document that was deleted from the database.

  • I repeat the procedure one hour later: document is still there.

  • two hours later i am watching the appicatlion and suddenly see the document disappearing. I check minimongo, and indeed the document is gone.

  • I log out and log in again. Document is still gone.

So there seemed two be a two hour delay in updating minimongo after removal of the document, which is very strange.

After this I tested with removing other documents from the same collection and minimongo updates instantly.

Thinking about this a little more I think I mislead you here with my previous comment. The situation I was thinking about was when a user has a publication for some document and then that publication gets terminated, then, as I understand it, it is not guaranteed that the document will immediately disappear from client side minimongo. But thatā€™s not the case here. Here the document is deleted on the server and so the publication should update it right away. So no idea what is going on.

thanks anyway for thinking along!

That suggests that the document still gets published from the sever. Maybe the publication is never finished. An example is seen here in the docs:

Meteor.publish('secretData', function () {
  if (this.userId === 'superuser') {
    return SecretData.find();
  } else {
    // Declare that no data is being published. If you leave this line out,
    // Meteor will never consider the subscription ready because it thinks
    // you're using the `added/changed/removed` interface where you have to
    // explicitly call `this.ready`.
    return [];
  }
});

See the return value.

Maybe you might be able to reproduce and test with an empty array as return value.

I have never encountered this issue myself, but from the description this sounds like a potential issue called replication lag (if you are using Mongo in a replica set).

That is, your document was deleted in the primary, but the subscription was still serving from one of the secondaries that was updated belatedly.

More info here: https://docs.mongodb.com/manual/tutorial/troubleshoot-replica-sets/

1 Like

Is the deleted document an embedded document, or a separate entry in the collection?
If itā€™s an embedded document, the clients would see that it was updated, but might merge it with existing data?
I know that Meteor / minimongo only does merging on top level fields of the document

2 Likes

Are you using default Meteor collection functions when deleting the document?

@rjdavid

Are you using default Meteor collection functions when deleting the document?

Yes Mongo.Collection.remove is the function that I use. I must say however that my collections are all wrapped in MongoObservable.Collection from MeteorRxjs, but I donā€™t think that has anything to do with the issue at hand.

@coagmano

Blockquote Is the deleted document an embedded document, or a separate entry in the collection?

a separate entry. Like I wrote, the problem I encountered seemed to be a one-time problem. I cannot reproduce it. When I remove another item in the same collection in the same way, the subcription updates like expected.

@illustreets

Blockquote from the description this sounds like a potential issue called replication lag

This is the direction that I am thinking of. Some temporary problem in the communication between Meteor server and MongoDb. But doesnā€™t do Meteor some kind of ā€˜pollingā€™ to keep subscriptions updated? So how could the delay last for two hours?
We indeed do use a database server that uses replica sets (MongoDB Atlas)

Can anyone point me to technical documentation about how subscriptions keep themselves updated? Or should I consult the source code for that?

I would bet my money on a glitch in replication between one Mongo instance and another Mongo instance. I find it hard to imagine how a situation like this can occur in a Meteor publication.

A brilliant explanation of how publications work under the hood by one of Qualiaā€™s founders, Lucas: Optimizing subscriptions - one of the best Iā€™ve read on this subject.

Then, this one is a must-read: https://guide.meteor.com/data-loading.html

Last but not least, an old article but still very relevant, which also talks about polling: https://blog.meteor.com/tuning-meteor-mongo-livedata-for-scalability-13fe9deb8908

EDIT 1: Maybe give this troubleshooting guide a try: https://bluemedora.com/troubleshoot-mongodb-replication-lag/

EDIT 2: Perhaps a delayed secondary replica has been promoted to primary replica? But that should not have happened by configuration. Any changes outside of the delay window will be missing (i.e. if 2 hour delay was set on the replica which was now promoted, it means 2 hours of CRUD actions not applied on the delayed node). Maybe itā€™s worth writing to Atlas and ask what happened during that window.

3 Likes

@illustreets

thanks for your elaborate reply.

I will follow your suggestion that Mongo must be responsible and Meteor canā€™t be blamed.

No problem. Haha, itā€™s not about the blame :slight_smile: itā€™s just that it does not sounds like a place for this kind of error.

Ok, this is quite the simplification, but at its core Meteorā€™s publishing system doesnā€™t do much magic besides watching the Oplog and ā€œpropagatingā€ the changes it finds there. I canā€™t see how youā€™d end up with non-existent documents being published to the frontend if they donā€™t appear in Oplog ā€¦

ā€¦ as far as the framework itself goes. However, thereā€™s also the matter of caching. You may have your own caching layer. Galaxy might have its own system for that, to improve publishing performance (I really donā€™t know if they do). One of these caches, if it exists, could have held stale data in them if not configured properly.