Issue with OplogExcludeCollections

So we had some weird behaviour we couldn’t explain and once I tracked it down I figured I should share. We’re running 2.16.

The issue:
Certain methods were never getting the response from the DDP and just hanging. The most noticeable was on logging out. You’d end up logged out on the server but not on the client.

The culprit?
We have an auditlog which runs via hooks. I had thought it a good idea to exclude big noisy collections like this from the oplog parsing via OplogExcludeCollections but it seems this is exactly what has broken our method calls. I removed the auditlog from the list and all works ok again.

I think we should at least smack a big warning label on this functionality ? Is this the expected behaviour ? I asked copilot what it thought of the code here and if it could see exactly why it happens and it had plenty of ideas. I’ll leave it with the creators to decide though if this is expected behaviour or not.

I didn’t open an issue but let me know if I should.

I had a look through the PR and it seems this issue was already highlighted here by @colkadome. We can save other users some time by putting a big warning on this functionality.

1 Like

I’ll continue on here with my own little monologue :sweat_smile:

So last night we deployed a new version to production and got caught with this new ‘feature’ again. In this case it’s an even more straightforward case. We have a collection for metrics with no publications. It only gets incremented. We have a method which saves a message in the outgoing queue - so it’s typically around 100ms. Directly in this code (not via hooks or anything special) it also increments the metrics for messages sent.

After launch I noticed some really long method response times (50-60 seconds!) and saw they were all related to this method. In my own tests it didn’t take this long, more like 1 second, but still 10x the normal response time.

See the screeshots

image

So I removed the oplogExclude settings entirely and everything back to normal. I don’t think there’s anything particularly special about our setup. This feature really needs some more testing I think and/or a big warning label. I’m not sure if something else changed in 3.x that makes it work, or if it’s about the underlying MongoDB version (we’re on 6 still) and it’s oplog format. Either way, use with caution!

1 Like