Help setting a field across 50k records based on other field's value

Hi there, I’m having a little trouble finding the best way of updating roughly 50k records in a collection, setting the value of one field based on another.

The exact operation would be to take a reference field which looks something like “abc-001” and to set the referenceNumber field to 1.

When I thought of looping through the calculation would have been:

const referenceNumber = Number(reference.substring(4))

Other than looping through one by one, I’ve seen various aggregate suggestions but I keep seeming to get errors on my version of Meteor/Mongo (4.0.3).

I’ve even thought about trying to do the calculation in something like Excel and reimporting :frowning:

Interested to see what your approaches might be?

Thanks!

I think Mongo Compass has just released a feature perfect for your case. MongoDB Releases “Focus Mode” in Compass GUI | MongoDB Blog

You can also throttle down your looping to make sure things are being calculated as expected and you don’t find yourself in a race condition. Example:

const count = your_record_count

const doSomething = () => {
  for (let i = 0; i < count; i++) {
    (j => {
      setTimeout(() => {
        // console.log('what is my j', j)

        bound(() => {
          const { _id, userId, ..... } = SomeCollection.findOne({ ...conditions... }, { skip: i, fields: { .... } }) || {}
          
          // do your calculations
          // do your DB updates

          }
        })
      }, j * 200) // your loop throttle in ms.
    })(i)
  }
}

doSomething()

Hi there thanks for your answer, I’m hopeful that I won’t have any race conditions as the operation itself is quite simple and I’ll have the luxury of doing this without any active users.

Just the looping seemed to take a long time and in an ideal world I’d have been able to make some sort of aggregation work

I understand that in each iteration you need to read, process and write. Your looping is sync while reading and writing are async. While you could make everything async, looping let’s say takes 0.05 ms per iteration while writing/reading takes 1 ms per iteration. This means that while you are in the 1000th looping iteration, you are still writing/reading something like the 100th iteration and you have many reads and writes still opened from the “past”. Ok, my numbers are irrelevant but I think the idea is clear.

If you give 10 ms per iteration you end up with something like 8 minutes to complete the whole thing.

Alternatively you may try to build a stream or aggregation using a tool such as the one in Compass or in MongoDB Atlas and then run that in your code. One way or another, it takes x milliseconds to alter 1 record.

1 Like

Thanks for the explanation, I didn’t quite realise the implications of sync/async in this case. I think perhaps I need to try and make my method that processes this a little more efficient.

I’d imagine it would be more efficient to run this straight on the mongo layer as opposed to using a method inside the app itself? Thanks again

I think you could try to make use of these:

Thanks, I’ll take another look at them. I kept running into errors along the lines of

The dollar ($) prefixed field ... is not valid for storage

did you wrap everything into an array? (specific for aggregations vs updates)

To be honest I don’t think so, I tried to use an AI suggestion that was as follows:

db.invoices.updateMany(
   {},
   { 
     $set: { 
       referenceNumber: { $toInt: { $substr: [ "$reference", 4, -1 ] } } 
     } 
   }
)

perhaps use the rawCollection and something like this: Update MongoDB field using value of another field - Stack Overflow

1 Like

I think my errors are a result of running a version of mongo < 4.2 :frowning:

Not sure on the best way to go about if if not. Part of me thinks setting the values as a CSV outside of the app and then importing as done may be faster but will have to get Excel or similar to play ball

Thanks, that answer is really helpful. Confirms I need to take a different approach with < 4.2. I may try and test out $addFields although it warns it replaces the collection, not sure if this will mean the _id will change on documents as that would be a deal breaker

I learned a very important lesson here that I can’t believe I have been blissfully ignorant of until now.

Using fetch will bring all the documents into memory and is almost certainly the main reason that this and other heavy duty methods of mine have failed in the past.

Using cursor.forEach is a lot more useful in those cases

3 Likes