How do denormalize 2 collections (like products & categories) and keep them in sync?


#1

Hi guys,

I am desperate and could need some guidance on how to keep the relations of “products and categories” in sync with each other. :cold_sweat:

There are 2 collections: products and categories, related to each other:

  • 1 product can be assigned to n categories
  • 1 category can contain n products

Right now I am trying to synchronize both collections via the “collection-hooks”-package and SimpleSchema.autoValue(), but the below code does NOT work: it ends up in a loop, where 2 hooks keep on triggering each other.

In the end I’d like those commands to end up syncing both collections:

// Example commands
// inserts / updates in "Products"-Collection 
// should lead to category.productIds synced within "Category"-Collection
Products.Collection.insert({
  name: 'Product Name 1',
  categoryIds: [
    category1._id,
  ],
})
Products.Collection.update(knownProduct._id, { $set: { 
  categoryIds: [
    category1._id,
    category2._id,
  ],
} })

// on the other hand,
// inserts / updates in "Category"-Collection 
// should lead to products.categoryIds synced within "Products"-Collection
Categories.Collection.insert({
  name: 'Category Name 1',
  productIds: [
    product1._id,
  ],
})
Categories.Collection.update(knownCategory._id, {
  $set: productIds: [
    product1._id,
    product2._id,
  ],
})

How would you solve this problem in meteor?

This is the buggy code (SimpleSchema & CollectionHooks based)

Products.Schema = new SimpleSchema({
  name: {
    type: String,
    optional: false,
    label: 'name',
  },

  // CATEGORIES (1 product can be in n categories)
  //  we use ``categoryIds`` to assign via autoform
  //  and denormalize instances in ``categories.instances``
  categoryIds: {
    type: [String],
    optional: false,
    label: 'categories',
    autoform: {
      type: 'select-checkbox',
      options: function () {
        return Categories.Collection.find().map(function (c) {
          return { label: c.toString(), value: c._id }
        });
      },
    },
  },
  // denormalization of instances
  categories: {
    autoform: {
      omit: true,
    },
    type: Object,
    optional: true,
    blackbox: true,
    autoValue: function () {
      const values = this.field('categoryIds').value
      if (values) {
        const returnValue = []
        for (const value of values) {
          const cat = Categories.Collection.findOne({ _id: value })
          returnValue.push(cat)
        }
        return { instances: returnValue }
      }
    },
  },
})
// the loop-PROBLEM lies here
Products.Collection.after.update(function(userId, doc, fieldNames, modifier, options) {
  if (Meteor.isServer) {
    Categories.Collection.find({ categoryIds: doc._id }}).forEach((doc) => {
      Categories.Schema.clean(doc, {filter: false})
      // write the NEW cleaned doc back to the db
      // NOTE: make it a blackbox write: SKIP Collection2 and SimpleSchema totally
      //  to prevent autoValues from running again!
      Categories.Collection.update({_id: doc._id}, { $set: doc }, {bypassCollection2: true, validate: false, filter: false, autoConvert: false, removeEmptyStrings:false, getAutoValues: false })
    }
  }
})


Categories.Schema = new SimpleSchema({
  name: {
    type: String,
    optional: false,
    label: 'name',
  },

  // PRODUCTS (1 category can have n products)
  //  we use ``productIds`` to assign via autoform
  //  and denormalize instances in ``products.instances``
  productIds: {
    type: [String],
    optional: true,
    label: 'products',
    autoform: {
      type: 'select-checkbox',
      options: function () {
        return Products.Collection.find().map(function (c) {
          return { label: c.toString(), value: c._id }
        });
      },
    },
  },
  // denormalization of instances
  products: {
    autoform: {
      omit: true,
    },
    type: Object,
    optional: true,
    blackbox: true,
    autoValue: function () {
      const values = this.field('productIds').value
      if (values) {
        const returnValue = []
        for (const value of values) {
          const cat = Products.Collection.findOne({ _id: value })
          returnValue.push(cat)
        }
        return { instances: returnValue }
      }
    },
  },
})
// the loop-PROBLEM lies here
Categories.Collection.after.update(function(userId, doc, fieldNames, modifier, options) {
  if (Meteor.isServer) {
    Products.Collection.find({ categoryIds: doc._id }}).forEach((doc) => {
      Products.Schema.clean(doc, {filter: false})
      // write the NEW cleaned doc back to the db
      // NOTE: make it a blackbox write: SKIP Collection2 and SimpleSchema totally
      //  to prevent autoValues from running again!
      Products.Collection.update({_id: doc._id}, { $set: doc }, {bypassCollection2: true, validate: false, filter: false, autoConvert: false, removeEmptyStrings:false, getAutoValues: false })
    }
  }
})

#2

Just read this over a couple times. This is where you fucked up in your thinking.

You’re confusing yourself. These are exactly the same thing. There is just one thing going on here. Products in n categories.

What you are missing is a lookup table or method to allow multiple categories per product, this can be achieved many ways. For huge scale operations (over 10 million records) I used a lookup table for N categories for an adult video site, that was accessed over 5 million times a day. It worked great.

The premise is this 1 collection for lookup so it contains a record for the product id and the category id and that’s it.

{
  "productId": 'productId1',
  "categoryId": 'categoryId1',
}
{
  "productId": 'productId1',
  "categoryId": 'categoryId2'
}

You enter a record for every assignment and then do basic length or count operations to get statistics. You can also now just aggregate all where the category id matches selections within an array (multi-category selection feature) or you could aggregate the other way and retrieve all categories the product is placed in.

This method will work with any data you are trying to traverse multi-layer labelling too, including tags and relationships for contacts and such.


#3

I mentioned other ways btw, you can do a comma seperated list (for low intensity jobs it can work if you feel a seperate collection is overkill for a few records, you then regex the list or change to array and loop values)

This would mean you store as a string the list of IDs in your product collection.

Another way is using embedded documents which means you could embed a categories object which contained all the ids for each category, this might be nicer if you are going deep and need multi levels of sub-categories and sub-sub-categories for a housing agent site or something for financial software with many layers of options…


#4

@tomtom87: thanks a lot for your help!!! Is there a package to manage those cross-tables, or do you do it by hand?

I forgot to mention that I am using aldeed:ecosystem, with autoform and aldeed:tabular, so the embedded solution is what I am trying to go for in order to keep things easy in the frontend.

Basically in the frontend I am showing 2 forms and 2 lists:
Forms (aldeed:autoform)

  • a “edit-single-PRODUCT-form” (autoform), where I can assign categories to the selected product
  • a “edit-single-CATEGORY-form” (autoform), where I can assign products to selected category
    LIST-VIEWS (aldeed:tabular)
  • a “all products in a selected category”-list - showing all products within that specific categeory
  • a “ALL-products with their category-info”-list - showing all products and links to their assigned categeories

For aldeed:tabular it would be nice to have the data embedded in both collections.

I found that collection-hooks package allows a “Collection.direct”-operation to prevent loops (https://github.com/matb33/meteor-collection-hooks). PLUS returning false in a hook will prevent the underlying method (and subsequent after hooks) from executing.

Right now I am trying to get the “synced embedded data in both collections”-way to work… I’ll post my progress in here


#5

Hi guys,

yeah i finally got it to work. It definitly IS possible to denormalize 2 connections and keep them in sync. So if anyone else ever has a simliar challenge, check out my example-code in the above topic and add the “Collection.direct”-operation of collection-hooks-package. Then make use of “insert, update and remove”-hooks.

Still there is a lot of manual work work involved. It would be really cool to have a package available to help, which is based on the aldeed:ecosystem.

peerdb looks good but is not compatible. I guess astronomy also supports something like this?

So if anyone else finds a better solution, please let me know and post it to this topic. :slight_smile:


#6

Cool man… I still have no idea why you want to do that but I’m glad you found a solution that works.