Database performance of case-insensitive user lookups

brianlukoff · September 8, 2022, 10:52pm

We’ve been doing an analysis of our MongoDB Disk IOPS usage on Atlas, and many queries on the users collection seem to have a very high examined/returned ratio (sometimes in the thousands). I believe it’s due to the _selectorForFastCaseInsensitiveLookup function which constructs an unusual regex to try to speed up case-insensitive lookups by email.

The result is that about 1/3 of the total disk IOPS from our slow queries seem to come from the users collection alone. Has anyone else run into this and found a way to improve the performance of these queries?

znewsham · September 9, 2022, 12:05am

Yep. It’s particularly bad for large databases and common name prefixes. It’s brutal. It would be much better if the added index was case insensitive or the emails were forced to lowercase. We’re planning on patching this function to just use the case sensitive lookup since we clean all the emails before they’re inserted

wildhart · September 9, 2022, 6:06am

Take a look at this PR: Use lower case emails fix by ritwik1233 · Pull Request #11419 · meteor/meteor · GitHub

storyteller · September 9, 2022, 6:10am

I do believe there was already an issue raised about this, I don’t recall what was the resolution for certain, but I think there is an option to go to lowercase for emails.

rjdavid · September 9, 2022, 6:50am

To remove these performance bottlenecks:

emails must be converted to lowercase (the standard says otherwise but the upsides are worth breaking this)
usernames must have a secondary key for the lowercase versions

Therefore, all matching will be simple equalities.

I suggest to do this as a breaking change. Too many projects are becoming a victim of this problematic implementation.

Existings users can be given two escape routes:

How to migrate DB to support new implementation (and maybe a way to check which users might be affected)
Or, Meteor settings to use the old implementation

paulishca · September 9, 2022, 11:49am

I was assigned to run the tests on this. Will search for the ticket and get back.

paulishca · September 9, 2022, 11:52am

Ok this was (one of) the conversation: [PERFORMANCE] - Make email a low case string and amend/simplify searches in the Users collection · Issue #11382 · meteor/meteor · GitHub

zodern · September 9, 2022, 4:06pm

Another option is to use a case insensitive index. parse server does this, though I have never used it. This would allow us to preserve the original case for when displaying the email or if sending the email to a service that is case sensitive, and wouldn’t require a migration.

rjdavid · September 11, 2022, 10:26am

Seems like @zodern’s solution here is the lowest hanging fruit. Any reason why even this was never implemented?

zodern · September 11, 2022, 5:32pm

I suggested this a year and half ago, and there was a concern that there could be edge cases (maybe for certain languages or characters in emails) that Mongo might treat differently than Meteor currently does. I don’t think anyone ever looked into this, and I’m not sure if it would be a problem if these edge cases do exist.

rjdavid · September 12, 2022, 11:09am

Seems like we are sacrificing actual projects with actual developers reporting performance issues in order to handle that edge cases that we don’t even know if these cases exist

jasongrishkoff · October 3, 2022, 11:15am

Is there a way for us to use a case-insensitive index and override the default _selectorForFastCaseInsensitiveLookup function @zodern?

radekmie · October 4, 2022, 11:49am

No, because collations are configured in the options (i.e., second argument), not the query. You could, however, override the _findUserByQuery method to force the collation and _selectorForFastCaseInsensitiveLookup to simply return { [fieldName]: string }. (I assume the appropriate index is already created beforehand.)

EDIT: It may be the case that Meteor Collections API are still not handling collations (even on the server). If so, it’ll require using the driver directly (i.e., rawCollection).