Storing hierarchies and observing changes on them

karlitos · September 8, 2018, 1:18pm

Hello,

I am working on an application which manages hierarchies (trees) created by different users. In my case I need to meet following criteria:

fast access to all nodes belonging to hierarchy/tree X
access to particular node Y belonging to hierarchy/tree X
observing changes (node added, node edited …) on hierarchy/tree X

I create some interactive visualization on the client, the initial creation is computing intensive, incremental updates are manageable. Thus I can not use the traditional meteor way: data changes -> re-render and have to track the atomic changes.

I started with following data model:

collection('hierarchies'):
{
   _id: mongo_id
  descrption: 'Some brief description, bla bla, yedu yedu ...'
  owner: some_user_id
  nodes: [
    {
      nodeId: some_node_id
      parent: ....
      children: ...
    }
  ]
}

This allows me to get access to all nodes from particular user or with particular id, also the access to certain id is no problem. However using Cursor.observeChanges on particular collection gives me always whole nodes array - and not only the last change.

According to this discussion I should make a collection containing the nodes separately. I am not sure if this would be good idea, since the collection would store many (hundred) thousand of nodes and I fear a could get a good performance when accessing particular nodes or nodes belonging to particular hierarchy.

I also tried to store the changes in the collection itselft:

collection('hierarchies'):
{
   _id: mongo_id
  descrption: 'Some brief description, bla bla, yedu yedu ...'
  owner: some_user_id
  recentChange: {
    type: 'node.added',
    payload: {
      nodeId: some_node_id
      parent: ....
      children: ...
    }
  }
  nodes: [
    {
      ...
    }
  ]
}

and use the Cursor.observeChanges on the recentChange field. This seems to work, I am not sure whether it is a good solution or not.

Are there any alternatives?

rjdavid · September 8, 2018, 3:39pm

Try checking redisoplog

karlitos · September 8, 2018, 7:28pm

Thanks, but could you be a little bit more specific ? How would it help in my case ?

robfallows · September 8, 2018, 7:31pm

I’d start by looking at databases specifically designed for graphs. Neo4j being an obvious choice.

karlitos · September 8, 2018, 8:08pm

Well… I did. I played with OrientDb for a while. And I did not find any particular advantages for my scenario, because I do not work on heavily connected data - I work on hierarchies. I do not need operations on subtrees, I don’t do any complicated queries in the way: get everything which connected to X an have an edge E to Y

Getting all nodes for particular hierarchy/tree or subtree usually means traversing the graph - I did not see any particular advantage. So all trees are graphs, but that does not mean that what is best for graphs is also best for trees.

doctorpangloss · September 9, 2018, 1:11am

What if you stored exactly what you needed to render?

A d3 visualization isn’t actually a graph, it’s a giant list of circle, line and color drawing commands. It has a very compact representation, and it will be easier to reason about “node added, node edited” in terms of changes to something graphical like the image than the underlying data structure.

rjdavid · September 9, 2018, 4:42am

With redisoplog, you can fine tune the mutation

github.com

cult-of-coders/redis-oplog/blob/master/docs/finetuning.md

## Fine-Tuning

Now that we have a generic idea of how this works, we can make some fine-tuning that will **dramatically** increase the speed of your highly loaded app.

### Custom Channels

A very common scenario where this can be used is a chat application. Usually when we create a chat app, we have a parent "Thread",
and each message will have threadId pointing to our thread.

```js
Meteor.publish('messages_by_thread', function (threadId) {
    // perform additional security checks here

    return Messages.find(selector, {
        channel: 'threads::' + threadId + '::messages' // you can use any conventions that you like for naming them, it's not relevant
        // any name you choose just make sure it cannot conflict with a collection (threads), or a direct listening (threads::threadId)
    });
})
```

This file has been truncated. show original

karlitos · September 9, 2018, 10:38am

I transform the data using d3,stratify to something which can I feed to d3.hierarchy and upon that data a specific layout is computed. This can take some significant amount of time when the number of nodes is big, but it’s ok as long I have to do it only at the beginning. Later, I work on the computed hierarchy and layout in a way similar to the collapsible tree example. Adding/removing/updating particular node means operation on some particular sub-tree on the client.

karlitos · September 9, 2018, 10:41am

@rjdavid Thanks. so you are saying I shall go with a potencial huge node collection and just fine-tune the queries for better performance?

rjdavid · September 9, 2018, 11:14am

I don’t comprehend much of what you are trying to do. But if the performance issue is with specific changes updating all subscriptions, redis-oplog can create specific subscription channels and changes can only be directed to a specific channel.

robfallows · September 10, 2018, 11:05am

If you are using MongoDB for this, there are a few things to consider.

Given that a simple hierarchy can be expressed as a JavaScript object (and therefore as a MongoDB document), it seems that a 1:1 mapping is achievable: each hierarchy is described by a single MongoDB document. This is the denormalising approach often recommended by MongoDB themselves.

This approach has its advantages, including:
- Single MongoDB methods can mutate the complete hierarchy.
- Once you’ve fetched the hierarchy, it’s possible to traverse it entirely in memory without further database operations.
However, it also has disadvantages, including:
- MongoDB documents are limited to 16MB. That’s fine for most hierarchies, but if you want to store the taxonomy of plants, for example, then it’s not going to be enough.
- Meteor’s observe methods operate at the document root level:
```
 doc
 ├── a[]
 │   ├── x
 ├── b
 │   ├── c[]
 │   │   ├── y
 │   │   └── z
```
  A change to doc.b.c[99].z will result in an observation that doc.b has changed. It’s up to you to be able to identify the specifics, if that’s important to you.
- How to handle arrays is probably the one MongoDB question that keeps coming up on these forums. The syntax is not pretty, especially if you have multiply-nested arrays.
One way to solve the above is to normalise the hierarchy and store each node, with pointers to other nodes in a single MongoDB document. In other words, each node has appropriate meta-data, together with an array of ids of other nodes (documents) in the collection.

This approach has its advantages, including:
- It can store arbitrarily large hierarchies.
- Meteor’s observe operations are cleaner, since mutations only affect document-level objects.
- You still need to deal with MongoDB’s array handling, but at least it’s on a single level.
However, it also has disadvantages, including:
- Traversing the hierarchy is harder (and slower).
- You may not be able to get a complete hierarchy in memory.
- You will likely need extra meta-data. For example, given a single document, how to discover to which hierarchy it belongs, and where the root of the hierarchy is.
Redis-oplog will work with either of the above strategies. It is a good choice for performance and scaling. However, if you are likely to have out-of-band database mutations, it will not reactively see those at all.

However, it has advantages over the standard Meteor oplog-tailing, if your application runs entirely with Meteor. In that situation, you can get cleverer with option (1), by using events to track mutations to sub-documents in much finer detail.

Back in a previous life, I wrote a library to handle hierarchies similar to (2), but using PHP and MySQL. That’s been in production, running non-stop for nearly 10 years now. Until recently (v4.0), the lack of transactions would have put me off trying it MongoDB.

karlitos · September 10, 2018, 12:04pm

Hello @robfallows many thanks for your exhausting explanation. I was actually planing to store the hierarchy nodes in an array anyway, I use d3.stratify method to convert such an array in a nested hierarchy object. Updating a “nested hierarchy object” or even appending new nodes seemed to me like a pretty difficult problem.

Problem with my thinking was, that I wanted to have a hierarchies collection storing the related nodes in an embedded array.

hierarchy
---------
├──title: ...
├──owner: ...
├── nodes: []

I was afraid to store all nodes for all hierarchies in one huge collection. Than I hit the issue with:

Meteor’s observe methods operate at the document root level:

So I plan to have one huge nodes collection and one additional hierarchies collection storing all metadata related to whole hierarchies. Every node will track

This approach looks promising to me, however I struggle now with additional questions, e.g.: if a hierarchy has an list of contributors, how can I tell if a user with particular user-id is allowed to insert a node referencing particular hierarchy.

This is more generic question so I created separate question here