Extract (and visualize) Mongo data model



I made it because I need it, but can be useful to someone too: npm module (CLI) extracts schema from Mongo database into .json or .html (open output html in browser and it will render ER model).

Example output (screenshot of resulting html rendered in browser) :



Wow excellent. Appreciate!

I tried -

  1. I have two dbs

show dbs


extract-mongo-schema -d “mongodb://localhost/meteor” -o schema.html -f html-diagram
=> success

extract-mongo-schema -d “mongodb://localhost/test” -o schema.html -f html-diagram
=> fail. “mongodb://localhost/test” is a connection path actually used

---------- log:
Extract schema from Mongo database (including foreign keys)

TypeError: Cannot read property ‘function’ of undefined
at getDocSchema (C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:41:31)
at getDocSchema (C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:59:5)
at C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:111:4
at Array.map (native)
at C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:110:8
at Array.map (native)
at getSchema (C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:100:18)
at printSchema (C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:125:16)
at C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\node_modules\wait.for\waitfor.js:15:31

  1. Some table name is not displayed in html. (body is displayed though. viewing source, everything is ok)


@jwkim thank you for review. How I can reproduce that issue with “test” database?


By the way, I think it is fixed now (including “some table name not displayed”)


@perak how does it know which collection to link to ? Let’s say I have a field: “randomId” ? How does it work ?

Good job by the way this is a very nice tool!


Hi @diaconutheodor thanks. How it works? Very simple - get value of randomId and searches all collections trying to find that value in their _id fields.


Great man! thanks for sharing!

get value of randomId and searches all collections trying to find that value in their _id fields.

So it doesn’t work on an empty DB?


No way - mongo doesn’t have schema


That is so smart. :slight_smile: , must it end with Id or does it apply to all strings ? Or does it try to identify if the field looks like an id ? Does it do this for all documents ? Or just for the first one that it finds with that pattern, and learns from it ?


If string “looks like” id (matches regex patern). You can see the source code btw


Script reads 100 documents from each collection and makes statistics for fields (the same field can be string or number or whatever in the same collection). Most frequent type is choosen. If field value “looks like” id (and is not already marked as foreign key) program searches all collections for that id. (for each of 100 documents where field looks like id until it is confirmed “foreign key”)


Sorry I’m late.

  1. mongodb’s log:

2017-10-27T07:10:18.530+0900 I NETWORK [conn1] received client metadata from conn1: { driver: { name: “nodejs”, version: “2.2.33” }, os: { type: “Windows_NT”, name: “win32”, architecture: “x64”, version: “6.1.7601” }, platform: “Node.js v6.9.1, LE, mongodb-core: 2.1.17” }
2017-10-27T07:10:18.648+0900 I COMMAND [conn1] command test.tasks command: find { find: “tasks”, filter: {}, limit: 100 } planSummary: COLLSCAN keysExamined:0 docsExamined:100 cursorExhausted:1 numYields:2 nreturned:100 reslen:30647 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCou
nt: { r: 3 } }, Collection: { acquireCount: { r: 3 } } } protocol:op_query 105ms

The number of documents in tasks collection is not so large, but over 100.
Presumably, this seems to be relevant to the cause.
Looked into error lines of code.(but as expected, this gave no specific clue)

  1. and googled with some keywords(cursorExhausted:1, docsExamined:100, …)
    I don’t know any further, but I guess this can be some clue:

Q: …But we are facing cursor timeout kind of problems.
A: Instead of using a cursor over the entire collection you can try paging through the collection by the _id.
So each time query for 100 documents (order by _id) and keep the last _id you encounter.
Then on each consecutive query use a condition to fetch documents where _id > last _id from previous fetch.


reinstalled, yes it is fixed.


this is fantastic, and really REALLY useful for documentation and collaboration. Thanks!