How to solve $group mongo with document > 16MB?

theara · January 25, 2019, 12:30am

I would like to use $push on $group with Million of data.
Ex:

data = [
  {item: 'A', date, qty, price, amount..},
  {item: 'A', date, qty, price, amount..},
  {item: 'B', date, qty, price, amount..},
  ...........
]
----------------
Inventories.aggregate([
    {
        $group: {
            _id: "$item", 
            data: {
                $push: {
                    date: "$date",
                    qty:"$qty",
                    price:"$price",
                    amount:"$amount",
                }
            }
        }
    }
])

I got the error Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.
And then I tried

.......([....], { allowDiskUse: true })

I got the error BSONObj size: 19046166 (0x1229F16) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 293825087070.

Could help me???

raragao · January 25, 2019, 11:49pm

I think this is a limit of MongoDB. See this manual:

theara · January 25, 2019, 11:56pm

thanks, Have any solve?

pmcochrane · January 28, 2019, 12:34am

I’d think you would have to rework your query to get it to work. To my knowledge each property in a mongo document has a 16mb limit. Try to write your query to create one of your larger objects nested within another property to allow it its own 16mb limit. This may overcome your problem.

I would seriously be rethinking what you are querying for though. Do you really need that much data returned in each row? Can’t you filter out at least some of the properties being returned?

robfallows · January 28, 2019, 12:36pm

That aggregation query is taking all the documents in your collection and creating one document for each distinct item. The problem you are having is that the $push operator adds another entry to an array of values for each qualifying input document. That means that the aggregated document size can become very big. If each array element is 16 bytes, you only need 1M item: 'A' in your collection to exceed the 16MB document size in your pipeline.

Do you really need to use $group?

theara · January 29, 2019, 12:25pm

I would like to group of A and push all transactions of A to array, and then View Report like this:

robfallows · January 29, 2019, 1:35pm

You’re also grouping by date. You need to add that to the group criteria

theara · January 29, 2019, 2:41pm

Could you share your code?

robfallows · January 30, 2019, 2:13pm

Assuming your date is a JavaScript Date object, my pipeline would look something like this (simplified from your sample data):

[
  {
    $project: {
      short_date: {
        $dateToString: {
          date: "$date",
          format: "%Y-%m-%d",
          timezone: "+00:00"
        }
      },
      item: 1,
      qty: 1,
      price: 1
    }
  },
  {
    $group: {
      _id: {
        $concat: [
          "$short_date",
          "/",
          "$item"
        ]
      },
      date: { $first: "$short_date" },
      item: { $first: "$item" },
      qty: {
        $sum: "$qty"
      },
      price: {
        $sum: "$price"
      },
      amount: {
        $sum: { $multiply: ["$qty", "$price"] }
      },
    }
  },
  {
    $sort: { _id: 1 }
  }
]

Notes: I’ve used a string for the group _id so that you can use it successfully with Meteor’s pub/sub (using tunguska:reactive-aggregate for example) and minimongo. As the _id also includes the short_date and item fields, you don’t really need them in the group document, but it saves parsing them.

theara · January 31, 2019, 1:37am

So it mean that your example view:

But I would like view like this for User (and sometime I have 3, 4 group levels)

Please advice?

ralpheiligan · January 31, 2019, 6:45am

You can use this mongodb to play with your query and see the result immediately and use it it your code.
https://nosqlbooster.com/

robfallows · January 31, 2019, 10:33am

Just change the sort to $sort: { item: 1, date: 1 }. You may be able to take advantage of indexes by moving the $sort stage to the start of the pipeline.

theara · February 1, 2019, 12:03am

@robfallows, it mean that I must arrange this with JS Loop by check item name is the same before or not

let data = ....
let tmpItem = ''
let result = []
data.forEach(it => {
  // Check item name
  if(it.item === tmpItem){

  }
})

robfallows · February 1, 2019, 9:44am

Basically, yes. The data will be ordered correctly. However, it’s up to you how to present the data,