Hey folks, hoping someone can point me toward best practices here. I’ve done a bit of poking through the forum’s history, but none seem to address my particular problem: generating sitemaps dynamically for a large collection (500,000 documents).
I’m using this package to generate my sitemaps. It works well, though they do suggest switching to static sitemaps for larger collections.
My strategy is to chunk the sitemaps into sets of 1,000 to try and reduce the database query. I know sitemaps can be much larger, but pulling 10,000 documents in one request is pretty hefty.
Problem: The main issue with this chunking strategy mentioned above is that if skip
= 200,000, the mongodb query appears to “examine” 201,000 keys to arrive at the batch I need. And that leads to query times approaching 1000ms.
Example code below:
Index: Collection._ensureIndex({'edited':1,slug:1})
← working as intended
for (var i = 0; i < 500; i++) {
sitemaps.add('/sitemap-' + i + '.xml', function(req) {
var urls = []
var items = Collection.find({edited:true},{
sort:{slug:1},
limit:1000,
skip:i * 1000,
fields:{slug:1,date:1},
}).fetch()
_.each(items, function(item) {
var date = artist.date
urls.push({
page: 'https://www.example.com/item/' + item.slug,
lastmod: item.date,
priority: 0.6
})
})
return urls
})
}
Any tips on how to make this more-efficient? Should I switch from dynamic on-request results to something static/cached?