Extract (and visualize) Mongo data model

perak · October 25, 2017, 10:44pm

Hi,

I made it because I need it, but can be useful to someone too: npm module (CLI) extracts schema from Mongo database into .json or .html (open output html in browser and it will render ER model).

Example output (screenshot of resulting html rendered in browser) :

Enjoy!

jwkim · October 25, 2017, 11:10pm

Wow excellent. Appreciate!

I tried -

I have two dbs

show dbs
…
meteor
test

extract-mongo-schema -d “mongodb://localhost/meteor” -o schema.html -f html-diagram
=> success

extract-mongo-schema -d “mongodb://localhost/test” -o schema.html -f html-diagram
=> fail. “mongodb://localhost/test” is a connection path actually used

---------- log:
Extract schema from Mongo database (including foreign keys)

Extracting…
TypeError: Cannot read property ‘function’ of undefined
at getDocSchema (C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:41:31)
at getDocSchema (C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:59:5)
at C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:111:4
at Array.map (native)
at C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:110:8
at Array.map (native)
at getSchema (C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:100:18)
at printSchema (C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\extract-mongo-schema.js:125:16)
at C:\Users\Administrator\AppData\Roaming\npm\node_modules\extract-mongo-schema\node_modules\wait.for\waitfor.js:15:31

Some table name is not displayed in html. (body is displayed though. viewing source, everything is ok)

perak · October 26, 2017, 8:51am

@jwkim thank you for review. How I can reproduce that issue with “test” database?

perak · October 26, 2017, 9:15am

By the way, I think it is fixed now (including “some table name not displayed”)

diaconutheodor · October 26, 2017, 11:35am

@perak how does it know which collection to link to ? Let’s say I have a field: “randomId” ? How does it work ?

Good job by the way this is a very nice tool!

perak · October 26, 2017, 12:16pm

Hi @diaconutheodor thanks. How it works? Very simple - get value of randomId and searches all collections trying to find that value in their _id fields.

narigondelsiglo · October 26, 2017, 1:00pm

Great man! thanks for sharing!

get value of randomId and searches all collections trying to find that value in their _id fields.

So it doesn’t work on an empty DB?

perak · October 26, 2017, 1:38pm

No way - mongo doesn’t have schema

diaconutheodor · October 26, 2017, 2:05pm

That is so smart. , must it end with Id or does it apply to all strings ? Or does it try to identify if the field looks like an id ? Does it do this for all documents ? Or just for the first one that it finds with that pattern, and learns from it ?

perak · October 26, 2017, 6:27pm

If string “looks like” id (matches regex patern). You can see the source code btw

perak · October 26, 2017, 6:34pm

Script reads 100 documents from each collection and makes statistics for fields (the same field can be string or number or whatever in the same collection). Most frequent type is choosen. If field value “looks like” id (and is not already marked as foreign key) program searches all collections for that id. (for each of 100 documents where field looks like id until it is confirmed “foreign key”)

jwkim · October 26, 2017, 11:13pm

Sorry I’m late.

mongodb’s log:

2017-10-27T07:10:18.530+0900 I NETWORK [conn1] received client metadata from 127.0.0.1:49600 conn1: { driver: { name: “nodejs”, version: “2.2.33” }, os: { type: “Windows_NT”, name: “win32”, architecture: “x64”, version: “6.1.7601” }, platform: “Node.js v6.9.1, LE, mongodb-core: 2.1.17” }
2017-10-27T07:10:18.648+0900 I COMMAND [conn1] command test.tasks command: find { find: “tasks”, filter: {}, limit: 100 } planSummary: COLLSCAN keysExamined:0 docsExamined:100 cursorExhausted:1 numYields:2 nreturned:100 reslen:30647 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCou
nt: { r: 3 } }, Collection: { acquireCount: { r: 3 } } } protocol:op_query 105ms

=>
The number of documents in tasks collection is not so large, but over 100.
Presumably, this seems to be relevant to the cause.
Looked into error lines of code.(but as expected, this gave no specific clue)

and googled with some keywords(cursorExhausted:1, docsExamined:100, …)
https://stackoverflow.com/questions/19486827/mongo-db-cursor-issue-while-iterating-over-a-huge-collection
=>
I don’t know any further, but I guess this can be some clue:

Q: …But we are facing cursor timeout kind of problems.
A: Instead of using a cursor over the entire collection you can try paging through the collection by the _id.
So each time query for 100 documents (order by _id) and keep the last _id you encounter.
Then on each consecutive query use a condition to fetch documents where _id > last _id from previous fetch.

jwkim · October 26, 2017, 11:38pm

reinstalled, yes it is fixed.

pushplaybang · November 8, 2017, 11:36am

this is fantastic, and really REALLY useful for documentation and collaboration. Thanks!