We’ve been doing an analysis of our MongoDB Disk IOPS usage on Atlas, and many queries on the users collection seem to have a very high examined/returned ratio (sometimes in the thousands). I believe it’s due to the _selectorForFastCaseInsensitiveLookup function which constructs an unusual regex to try to speed up case-insensitive lookups by email.
The result is that about 1/3 of the total disk IOPS from our slow queries seem to come from the users collection alone. Has anyone else run into this and found a way to improve the performance of these queries?
Yep. It’s particularly bad for large databases and common name prefixes. It’s brutal. It would be much better if the added index was case insensitive or the emails were forced to lowercase. We’re planning on patching this function to just use the case sensitive lookup since we clean all the emails before they’re inserted
I do believe there was already an issue raised about this, I don’t recall what was the resolution for certain, but I think there is an option to go to lowercase for emails.
Another option is to use a case insensitive index. parse server does this, though I have never used it. This would allow us to preserve the original case for when displaying the email or if sending the email to a service that is case sensitive, and wouldn’t require a migration.
I suggested this a year and half ago, and there was a concern that there could be edge cases (maybe for certain languages or characters in emails) that Mongo might treat differently than Meteor currently does. I don’t think anyone ever looked into this, and I’m not sure if it would be a problem if these edge cases do exist.
Seems like we are sacrificing actual projects with actual developers reporting performance issues in order to handle that edge cases that we don’t even know if these cases exist
No, because collations are configured in the options (i.e., second argument), not the query. You could, however, override the _findUserByQuery method to force the collation and _selectorForFastCaseInsensitiveLookup to simply return { [fieldName]: string }. (I assume the appropriate index is already created beforehand.)
EDIT: It may be the case that Meteor Collections API are still not handling collations (even on the server). If so, it’ll require using the driver directly (i.e., rawCollection).