Coming from a ER/SQL life, I have found denormalisation to be very hard to put into practice (i.e mental block).
I’m using jeanfredrik:denormalize to automatically sync up some fields, but it would be great if it worked.
Anyone out there using it successfully?
Aside: I thought a automated index sync for mongo, like this, would be core or at least a very popular tool so we’re not all writing the same col1.update, col2.update, col3.update code everytime… there seems to be no tools for syncing denormalised values across collections. Even Discover Meteor just did it adhoc (probably trying to show you how it works under the hood).
To get a list of member names of an organisation, I’d have to get all members then use $in to get all users, and expose the name field.
I want to denormalize names (or generically, a label for any composite object who’s identity is in another document) so i can simplify my queries in publish and on the client.
Is this a good reason to denormalize? If not, how would you handle this case? It seems a common enough scenario where in SQL, I’d just do a join.
There is no general rule here. Nevertheless, most of the time, what I found is that “simplifying my queries” is not a good reason to denormalize. I think denormalizing usually adds more complexity to your code than the one incurred by implementing joins with $in or a join package.
As I am playing with nosql, I most of the time end by referencing from both ends.
So for this usecase in organisation I would have userId’s and in users collection orgId.
But from kinda security concerns I would be cloning users to separate collection without auth fields etc.
And use these for referencing (this can be done 2 ways - really clonning it, or overriding publish hooks to publish it under different collection name).
And it is not due to simplifying my queries, but cause I want to have all subscriptions ready at “same” time when I know only orgId and do not need to wait for results of 1 of these queries to start the 2nd.
But I am not data scientist
don’t know why you’d clone users, you can selectively publish fields to the client so i don’t publish any password related fields, no need to clone.
thing is, this is a many to many relationship so 1 org to many members, and many members to 1 user. members is the composite entity here so i have to keep 4 references and always updating all the _ids and doing $in is tedious. I’m much rather have a nested list of names because names are the only field you commonly need from one collection to another.
@Steve by join do you mean https://atmospherejs.com/perak/joins? The usage actually looks exactly the same as the denormalization helper, but i’d prefer the latter as I won’t have to perform this join everywhere, much cleaner to denormalize once and refer to everywhere.
The problem with Meteor client side DB, is that most of the time if your DBs a little of the big side, you have to send only a subset of them to the client. And then come the problem of having the right data at the right time on the client. I find denormalization is often the only way to make it work with acceptable perfs.
Yes, well it’s all good that @Steve and @shock give nice advice against denormalization and @vjau gives opinion for denormalization but will someone just answer my OP about what I’m doing wrong with the library please!
I don’t like cross posting but here’s the code if you’re not realising I have code:
Db.Members.cacheDoc('organisations', Db.Organisations, ['name'], {referenceField:'oid'});
Db.Organisations = new SimpleSchema({
name: { type: String }
});
Db.Members = new SimpleSchema({
oid: { type : String }
});
I’m trying to get the organisation name to appear in the members collection.
Performing join on the client requires delicate tuning of data loading. But, in my experience, it doesn’t cause perf issues.
Performing join on the server, by using the packages above for example, is easier but theoretically causes perf issues (I said theoretically because I have never tried in a real situation).
I tried denormalize too, but I have problem with multiple tevel.
For example I have Col1->Col2->Col3, and then if I update doc of Col1, it will update doc of Col2 but don’t Col3.