Need advice re denormalization tools like jeanfredrik:denormalize

mordrax · July 30, 2015, 3:28am

Coming from a ER/SQL life, I have found denormalisation to be very hard to put into practice (i.e mental block).
I’m using jeanfredrik:denormalize to automatically sync up some fields, but it would be great if it worked.
Anyone out there using it successfully?

Aside: I thought a automated index sync for mongo, like this, would be core or at least a very popular tool so we’re not all writing the same col1.update, col2.update, col3.update code everytime… there seems to be no tools for syncing denormalised values across collections. Even Discover Meteor just did it adhoc (probably trying to show you how it works under the hood).

Steve · July 30, 2015, 6:49am

Sorry to ask, but denormalization is an optimization technique, are you sure you need that?

mordrax · July 30, 2015, 7:43am

It may be so, but the alternative is then for me to complicate my publish and get logic for the following schema, e.g:

Meteor.user: {
   _id:String
   profile: {
      name:String
   }
}
Organisation: {
   _id:String
  members: [String]
}
Members: {
   _id:ObjectId
   orgId: String
   userId: String
}

To get a list of member names of an organisation, I’d have to get all members then use $in to get all users, and expose the name field.
I want to denormalize names (or generically, a label for any composite object who’s identity is in another document) so i can simplify my queries in publish and on the client.

Is this a good reason to denormalize? If not, how would you handle this case? It seems a common enough scenario where in SQL, I’d just do a join.

Steve · July 30, 2015, 7:55am

There is no general rule here. Nevertheless, most of the time, what I found is that “simplifying my queries” is not a good reason to denormalize. I think denormalizing usually adds more complexity to your code than the one incurred by implementing joins with $in or a join package.

shock · July 30, 2015, 10:47am

Hi,

As I am playing with nosql, I most of the time end by referencing from both ends.
So for this usecase in organisation I would have userId’s and in users collection orgId.

But from kinda security concerns I would be cloning users to separate collection without auth fields etc.
And use these for referencing (this can be done 2 ways - really clonning it, or overriding publish hooks to publish it under different collection name).

And it is not due to simplifying my queries, but cause I want to have all subscriptions ready at “same” time when I know only orgId and do not need to wait for results of 1 of these queries to start the 2nd.
But I am not data scientist

Have a good day.

mordrax · July 31, 2015, 10:18am

don’t know why you’d clone users, you can selectively publish fields to the client so i don’t publish any password related fields, no need to clone.

thing is, this is a many to many relationship so 1 org to many members, and many members to 1 user. members is the composite entity here so i have to keep 4 references and always updating all the _ids and doing $in is tedious. I’m much rather have a nested list of names because names are the only field you commonly need from one collection to another.

@Steve by join do you mean https://atmospherejs.com/perak/joins? The usage actually looks exactly the same as the denormalization helper, but i’d prefer the latter as I won’t have to perform this join everywhere, much cleaner to denormalize once and refer to everywhere.

vjau · July 31, 2015, 10:44am

The problem with Meteor client side DB, is that most of the time if your DBs a little of the big side, you have to send only a subset of them to the client. And then come the problem of having the right data at the right time on the client. I find denormalization is often the only way to make it work with acceptable perfs.

mordrax · July 31, 2015, 10:48am

Yes, well it’s all good that @Steve and @shock give nice advice against denormalization and @vjau gives opinion for denormalization but will someone just answer my OP about what I’m doing wrong with the library please!

I don’t like cross posting but here’s the code if you’re not realising I have code:

Db.Members.cacheDoc('organisations', Db.Organisations, ['name'], {referenceField:'oid'});

Db.Organisations = new SimpleSchema({
    name: { type: String }
});

Db.Members = new SimpleSchema({
    oid: { type : String }
});

I’m trying to get the organisation name to appear in the members collection.

Steve · July 31, 2015, 2:49pm

Sorry, I’m afraid I won’t answer your question again :- )

I was thinking more about publish-composite, publish-with-relations and the likes.

1: GitHub - Meteor-Community-Packages/meteor-publish-composite: Meteor.publishComposite provides a flexible way to publish a set of related documents from various collections using a reactive join[quote=“vjau, post:7, topic:7677”]
And then come the problem of having the right data at the right time on the client. I find denormalization is often the only way to make it work with acceptable perfs.
[/quote]

Performing join on the client requires delicate tuning of data loading. But, in my experience, it doesn’t cause perf issues.
Performing join on the server, by using the packages above for example, is easier but theoretically causes perf issues (I said theoretically because I have never tried in a real situation).

Anyway, here is (again) the reference post about join.

theara · August 3, 2015, 1:20am

I tried denormalize too, but I have problem with multiple tevel.
For example I have Col1->Col2->Col3, and then if I update doc of Col1, it will update doc of Col2 but don’t Col3.

thebarty · July 28, 2016, 6:03pm

Hi guys,

anyone interesting in denormalization - I spend some days on this and posted this topic: Who would be interested in a denormalization-package for SimpleSchema & Collection 2?.

I’d love to hear your feedback on this!