Mongo collection size, why so big?

I’m having a strange issue here where I have a collection with 2,200 documents in it that’s over 400mb. Further, I’m trying to render the whole collection client-side, which is obviously problematic. The thing I can’t understand is why it’s so large, as it’s storing very little text.
I know why it used to be large, which is I was storing entire web page content (as stringified HTML) inside each document. When I did that, my collection totalled 600mb, which I thought was a lot, but there was a lot of text content. Now, though, after removing all of that, I only lost 200mb. I’m not really sure where to start, any ideas on why it’s still so large?

Looking at db.stats tells me the average size per document is just under 200kb (which is huge) but when I look individually and get Object.bsonsize, the largest one is 4.2kb, which makes more sense.

Preallocation. Mongo sets aside disk-space in empty containers, so when the time comes to write something to disk, it doesn’t have to shuffle bits out of the way first. It does so by a doubling algorithm, always doubling the amount of disk space preallocated until it reaches 2GB; and then each prealloc file from thereon is 2GB. Once data is preallocated, it doesn’t unallocate unless you specifically tell it to. So observable MongoDB space usage tends to go up automatically, but not down.

Some research on the Mongo preallocation…


And to compact your Mongo installation (on ubuntu):

// compact the database from within the Mongo shell
db.runCommand( { compact : 'mycollectionname' } )

// repair the database from the command line
mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair --nojournal

// or dump and re-import from the command line
mongodump -d databasename
echo 'db.dropDatabase()' | mongo databasename
mongorestore dump/databasename
7 Likes

Awesome answer, the compact command took it instantly down from 420mb to just over 2mb.

1 Like