I think it depends what you wanna do with your docs. Maybe you can describe the use case some more and the bottlenecks you have encountered given your initial approach?
The collection items are reservation data. I have to call multiple external APIs for every item (create an invoice, send the data to the government, etc). I haven’t reached any bottlenecks yet, I’m just curios what’s the most efficient way to iterate over the items in a large collection without hurting the server too much.
AFAIK aggregation framework can “reduce” the collection data, e.g. when I want to sum some fields.
I ended up with this helper function:
import _ from 'lodash'
/**
* Query a collection in batches
* @param {Collection} Collection meteor+grapher collection
* @param {*} args args passed to createQuery
* @param {*} batchSize number of items per batch
* @yields {*[]} documents
*/
export default function* chunkQueryGenerator(
Collection,
args,
batchSize = 100
) {
// include arguments like $filter and $options
const specialArgs = _.pickBy(args, (v, key) => key.startsWith('$'))
// options for the ids query
const idsOptions = { ...specialArgs, _id: 1 }
const ids = Collection.createQuery(idsOptions)
.fetch()
.map(i => i._id)
// create chunks from the array of ids
const chunks = _.chunk(ids, batchSize)
for (const chunk of chunks) {
const $filters = {
$and: [{ ...(args.$filters || {}) }, { _id: { $in: chunk } }],
}
const chunkArgs = { ...args, $filters }
yield Collection.createQuery(chunkArgs).fetch()
}
}
When I want to do something with a large collection of items, I can simply:
const itemsGenerator = chunkQueryGenerator(Items, { foo: 'bar' })
for (const items of itemsGenerator) {
// items.length is 100
doSomethingWith(items)
}
The ids approach is a little unconventional but it’s probably your only option if the query you’re going to run on the docs is going to modify the order in which they’d be returned from the parent query.
Apart from that, in terms of what would be faster, it’d depend on the size of your collection, db indexes, how much of a factor network latency would be, etc. if you have an environment that looks something like what you think your prod environment will you could just run some tests…
But also it’s often a useful approach to just put good monitoring and logging in place and keep an eye on things over time. Early optimisation can be a real time killer.
Hi! We have a platform and use Amazon SQS when we want to process each item of a collection without hurting the server. Let me know if you are interested and I can tell you our approach!