Database performance of case-insensitive user lookups

We’ve been doing an analysis of our MongoDB Disk IOPS usage on Atlas, and many queries on the users collection seem to have a very high examined/returned ratio (sometimes in the thousands). I believe it’s due to the _selectorForFastCaseInsensitiveLookup function which constructs an unusual regex to try to speed up case-insensitive lookups by email.

The result is that about 1/3 of the total disk IOPS from our slow queries seem to come from the users collection alone. Has anyone else run into this and found a way to improve the performance of these queries?

3 Likes

Yep. It’s particularly bad for large databases and common name prefixes. It’s brutal. It would be much better if the added index was case insensitive or the emails were forced to lowercase. We’re planning on patching this function to just use the case sensitive lookup since we clean all the emails before they’re inserted

2 Likes

Take a look at this PR: Use lower case emails fix by ritwik1233 · Pull Request #11419 · meteor/meteor · GitHub

I do believe there was already an issue raised about this, I don’t recall what was the resolution for certain, but I think there is an option to go to lowercase for emails.

To remove these performance bottlenecks:

  • emails must be converted to lowercase (the standard says otherwise but the upsides are worth breaking this)

  • usernames must have a secondary key for the lowercase versions

Therefore, all matching will be simple equalities.

I suggest to do this as a breaking change. Too many projects are becoming a victim of this problematic implementation.

Existings users can be given two escape routes:

  1. How to migrate DB to support new implementation (and maybe a way to check which users might be affected)
  2. Or, Meteor settings to use the old implementation
2 Likes

I was assigned to run the tests on this. Will search for the ticket and get back.

Ok this was (one of) the conversation: [PERFORMANCE] - Make email a low case string and amend/simplify searches in the Users collection · Issue #11382 · meteor/meteor · GitHub

1 Like

Another option is to use a case insensitive index. parse server does this, though I have never used it. This would allow us to preserve the original case for when displaying the email or if sending the email to a service that is case sensitive, and wouldn’t require a migration.

6 Likes

Seems like @zodern’s solution here is the lowest hanging fruit. Any reason why even this was never implemented?

1 Like

I suggested this a year and half ago, and there was a concern that there could be edge cases (maybe for certain languages or characters in emails) that Mongo might treat differently than Meteor currently does. I don’t think anyone ever looked into this, and I’m not sure if it would be a problem if these edge cases do exist.

2 Likes

Seems like we are sacrificing actual projects with actual developers reporting performance issues in order to handle that edge cases that we don’t even know if these cases exist

4 Likes

Is there a way for us to use a case-insensitive index and override the default _selectorForFastCaseInsensitiveLookup function @zodern?

No, because collations are configured in the options (i.e., second argument), not the query. You could, however, override the _findUserByQuery method to force the collation and _selectorForFastCaseInsensitiveLookup to simply return { [fieldName]: string }. (I assume the appropriate index is already created beforehand.)

EDIT: It may be the case that Meteor Collections API are still not handling collations (even on the server). If so, it’ll require using the driver directly (i.e., rawCollection).