Hi there, I’m having a little trouble finding the best way of updating roughly 50k records in a collection, setting the value of one field based on another.
The exact operation would be to take a reference field which looks something like “abc-001” and to set the referenceNumber field to 1.
When I thought of looping through the calculation would have been:
Hi there thanks for your answer, I’m hopeful that I won’t have any race conditions as the operation itself is quite simple and I’ll have the luxury of doing this without any active users.
Just the looping seemed to take a long time and in an ideal world I’d have been able to make some sort of aggregation work
I understand that in each iteration you need to read, process and write. Your looping is sync while reading and writing are async. While you could make everything async, looping let’s say takes 0.05 ms per iteration while writing/reading takes 1 ms per iteration. This means that while you are in the 1000th looping iteration, you are still writing/reading something like the 100th iteration and you have many reads and writes still opened from the “past”. Ok, my numbers are irrelevant but I think the idea is clear.
If you give 10 ms per iteration you end up with something like 8 minutes to complete the whole thing.
Alternatively you may try to build a stream or aggregation using a tool such as the one in Compass or in MongoDB Atlas and then run that in your code. One way or another, it takes x milliseconds to alter 1 record.
Thanks for the explanation, I didn’t quite realise the implications of sync/async in this case. I think perhaps I need to try and make my method that processes this a little more efficient.
I’d imagine it would be more efficient to run this straight on the mongo layer as opposed to using a method inside the app itself? Thanks again
I think my errors are a result of running a version of mongo < 4.2
Not sure on the best way to go about if if not. Part of me thinks setting the values as a CSV outside of the app and then importing as done may be faster but will have to get Excel or similar to play ball
Thanks, that answer is really helpful. Confirms I need to take a different approach with < 4.2. I may try and test out $addFields although it warns it replaces the collection, not sure if this will mean the _id will change on documents as that would be a deal breaker
I learned a very important lesson here that I can’t believe I have been blissfully ignorant of until now.
Using fetch will bring all the documents into memory and is almost certainly the main reason that this and other heavy duty methods of mine have failed in the past.
Using cursor.forEach is a lot more useful in those cases