Let’s say I have a support/ticketing app like zendesk or helpscout.
I have a collection of tickets. I have a collection of customers.
My tickets have a customerId field that references the _id of a customer.
I turn atlas search on for my tickets collection and I instantly get full text search for ticket title, ticket descriptions and other fields stored directly on the ticket. However, I don’t get to search tickets by customer name because all I have is the customer _id.
always run a search over the customers collection as well, use any found customer _ids inside a subsequent atlas search query of the tickets table? i.e.
- Search “Ben Fa” inside of customers collection
- This returns 3 records for ben fallow, Ben faracon, and ben fables.
- I take those three _ids and include them in my next query of the tickets table… using the _ids
Or, do I denormalize and store a field with customer name on it on the ticket? Or maybe an array of strings with this sort of duplicated data (assuming I have other fields on the ticket the reference other collections by _id)?
If I denormalize, how do I keep the collection up to date if a customer changes their name? Maybe just a cron job that runs every X time period to ensure all tickets have updated data? Do I update every ticket whenever a customer name gets changed?
I’ve personally gone down the ‘denormalise’ route. It’s not perfect - like you say, what happens if the field in question changes?
In my specific case, the scenario of leaving the field in question not in sync is okay (and possibly even preferred). In your use case, my instinct here would be to:
- Denormalise for the performance boost you will get out of this - I am running the assumption that the search function will be utilised much more frequently compared to users changing the name.
- Rather than a cron job to keep the data in sync, make the update a part of the change name method so that when a name is updated it is updated in its primary collections as well as on every relevant document on the tickets collection.
Or maybe an array of strings with this sort of duplicated data (assuming I have other fields on the ticket the reference other collections by _id)?
This may be a spanner in the works depending on how many there are…?
I’d say it strongly depends on your use case. If it’s customer name, denormalization sounds good - it’s small (I guess less than 30-ish characters) and rarely changes. Whether it is synced with a CRON or on-change should be decided on the desired data freshness.
It’s not a golden bullet, though - if it’d had to be more customer-related data (e.g., contact data, address, other personal information), then the overhead of storing multiple copies of this data may be too large. The same goes for synchronization of write-intensive collection.