Hi all! I’m hoping that someone might be able to point me in the right direction on how best to essentially JOIN
two collections into a new collection.
I’m looking to do this in the fastest possible way, whilst keeping resource utilisation in check, and am happy to entertain using a different stack to achieve this if required.
What I have so far:
In MongoDB (compose.io):
Collection | Foreign Key | Documents | Size on Disk
-----------|-------------|-----------|-------------
Invoices | - | 118882 | 556 MB
LineItems | `invoiceId` | 1384724 | 1.47 GB
These two collections are automatically created every few hours by importing two CSV files into MongoDB using
mongoimport ... --drop --headerline --type csv --file FILENAME.csv
Meteor micro service (server only) to do the processing:
Invoices = new Mongo.Collection 'Invoices'
LineItems = new Mongo.Collection 'LineItems'
ProcessedInvoices = new Mongo.Collection 'ProcessedInvoices'
process = ->
startTime = new Date()
# Ensure indexes every time (could have dropped/inserted DB since last run)
LineItems._ensureIndex invoiceId:1
# Add LineItems to Invoice and save as ProcessedInvoice
Invoices.find().forEach (invoice) ->
invoice.lineItems = LineItems.find(invoiceId: invoice._id).fetch()
ProcessedInvoices.insert invoice
# Finish
console.log 'time', (new Date()) - startTime
process()
result:
{
_id: "invoice-1",
someProp: "",
lineItems: [
{
_id: "line-item-1",
invoiceId: "invoice-1",
someProp: ""
},
...
]
}
This works but is PAINFULLY slow! Read: it takes hours!
Questions I have:
- Could I optimise this by coding it differently?
- Am I using the wrong stack (meteor/mongodb)?
Bonus point questions:
- Can I make this run reactively when there are changes in the
Invoices
andLineItems
collections? Note: these two collections will only ever be modified with the automatedmongoimport ... --drop --headerline --type csv --file FILENAME.csv
. - What should I be looking for in the compose.io Monitoring section to see if I’m overutilising the database?