Thanks for the images! It looks like your problem is a little more complicated than ours
We just had to look into the Atlas’ profiler and it was easy to see where the problem was coming from. It seems like you already ran lots of tests and the cause of the issue is still not apparent. I’m sorry that I don’t know how to help you any further!
As a side note, we plan on stop using publish-composite because of some performance and compatibility issues we encountered, which it seems you mentioned that was related to your issues as well.
Best of luck with that! Let me know if you think I can help with anything else
No, we’re still on 2.8.2. To solve the problem we simply replaced the logic that used a count() for the entire collection with something equivalent (that code was pretty outdated anyway). We plan to replace all remaining count()s with either countDocuments() or estimatedDocumentCount() when we update to 2.9, which is when these new functions were released, since count() is now deprecated.
Unfortunately, we only noticed the problem in production. It was a pretty agitated morning as you can probably guess . Since the problem is directly linked to the amount of documents in the collection being counted, and since we have way more data in production than locally, the problem was totally invisible when running local tests.
I just confirmed this with the developer who fixed this in our team and he mentioned that I was not completely correct above (sorry for that), and actually confused (most probably with the mix-up of the changes in api).
Here are the solutions that worked for us depending on the size of the collection:
Most counts: countDocuments() + proper index of the query
Huge collections: caching the counter or keeping an external counter when adding/removing docs
We all share that same problem. If a dependency is updated, it will be up to us to read the changelog of those updated dependencies
You can always access them through rawCollection().
I’m not worried about the count issues as we don’t use it but am I right in thinking from your investigations Radoslaw that there’s only a performance issue if you’re using publish-composite ? So something with how that package works doesn’t play nice with the latest node driver ? We thought we had a nice successor to publish-composite in the reactive-aggregation package but unfortunately we’ve seen some odd behaviour there too, so now we’re wondering if we should look at grapher or it’s successor Nova
Edit: But looking at grapher it seems to sit on top of publish-composite anyway so unlikely it would help
And do you have any best guess at the cause? Mongo driver or something in meteor ? The only PR that stands out for me is the one you contributed to add the async API … but I’m guessing that’s probably the first thing you scrutinised? Are you using the async api or still sync ? Is it easy to try a version of 2.8.2 with the previous mongo package ?
I spent quite a lot of time trying different versions back then, and nothing really stood out. I went through the profiler, and it wasn’t one thing – it looked like multiple things just took longer. I’m planning on getting back to it in the following months, but that’s about it for now. I also thought it may be related to the Node.js version (or even V8), but my experiments were inconclusive (i.e., it wasn’t always worse with the newer version).
Thanks for that. I guess we’ll have to just try it out ourselves and see if it affects us or not
Are there plans for anyone from Meteor to look into this? I’m wondering if you guys have a test app with some performance tests that compare pub-sub / method speeds across different versions ?
I looked into it (I’m not a 100% Meteor guy, though ), and I did not reproduce this problem in a few other applications (both small and decently sized). It wasn’t a very in-depth investigation, but it seems to be related to some publication patterns or packages.
Hey, so we were actually trying to figure this out this morning and we found it’s not actually the fault of that package. It’s more to do with a mix of how the aggregation unwind and tabular interplay. So if you do a lookup and then unwind but don’t specify preserveNullAndEmptyArrays and have some docs where there’s nothing to join - you’ll have less docs than expected by tabular so its behaviour is to wait to have all the docs before updating (which will never happen) rather than listening to when the publication is ready. So just adding preserveNullAndEmptyArrays: true inside our unwinds has fixed it.
I thought I’d mention here that we upgraded from 2.7.3 → 2.10.0 a week ago and haven’t seen any performance hit at all with extensive publishComposite use. After reading this I had expected a hit. Our DB is running the latest 5.x if that’s of interest.
More than a year later, I’ve finally found the issue. And it’s so weird I just have to revive this old thread!
A brief reminder: we’re on 2.5.8 and every single version after that, caused our CI to use significantly more CPU and caused most of the tests to fail (or timeout). Locally, the app performed normally, even slightly faster.
I took yet another stab at it last week, and the problem is that… The app is now too fast. You see, because of the MongoDB driver upgrade and some deep Meteor changes (we’ve jumped to 2.16 in the end, but 2.6 is showing similar results), our methods got faster. And because of that, the webhooks sent after certain actions were not “already sent” when the client expected them. Overall, it’s just a couple % faster, but it was enough to say “it wasn’t sent yet”. So the client retried… Again and again. But then they were sent milliseconds after, so there was not a lot of them piling up – just a constant stream.
We’ll be rolling out this new version to production next week, but so far we see more or less the same CPU usage, slightly higher RAM usage (~4%), and visibly faster execution (E2E is faster by 5%).
That’s our success story allowing us to prepare the app for Meteor 3.