Please evaluate my observer and pub/sub architecture - I need advice

jononomo · February 27, 2016, 4:22pm

INTRO

I have a mongo collection called LineItems that contains about 80,000 documents – each document representing a line item in our system.

Users can view collections of these line items according to highly customizable criteria. Each view will contain between 0 - 600 line items. These views are “live” in the sense that the information updates in real time (this aspect is the whole point of using Meteor, actually.)

Users create a “live view” by defining the criteria, and there are about 200 pre-defined views available as well. Each one is stored as a document in a mongo collection called ViewDefinitions. It contains about 200 documents and the expectation is that it will grow over time as users add their own custom-defined views.

The real-time aspect is that any line item may be updated at any time during the day. There are two kinds of updates – simple updates to totals, etc, that do not affect which “live view” a line item appears in, and more fundamental updates that effectively cause a line item to disappear from one “live view” and appear in another.

Also, the definitions for the “live views” can themselves be updated in ways that change which line items appear in them. There are two types of these updates: ones in which a “pre-defined” view is either automatically inserted or removed or altered as the larger context dictates, and one in which a user adds or removes a custom-defined view.

Finally, each line item in a “live view” is assigned to a certain category along each one of about 3 different axes, and these assignments must all be re-computed each time a line item is either added or removed from a live view. Essentially, each live view line item’s category information must be re-computed each time any other line items is added or removed (I’m actually wondering whether I can be more clever here…). Summary information relating to these assignments must also be updated.

Basically, there is a high-level dashboard where you can see the names of all the live views along with some summary information relating to the categories to which the line items within the view have been assigned. This summary information must update in real time as adjustments to line items dictate which views they would appear in, and how the other line items in that view would influence which (and how many) categories the live view would contain.

The user can then choose to drill down into any one live view and examine the line items within, along with more detailed information about the categories.

My System

first: I have an observer that listens to the larger context and adds or removes pre-defined views as appropriate. i.e., adding/removing documents from the mongo ViewDefinitions collection.

second: I have an observer for all documents in the ViewDefinitions collection and each time a view definition is added, removed, or altered, a corresponding observer that I hold in a dictionary in memory is either added, removed, or altered. So there is one observer per view definition – each of these observers has a slightly different query into the LineItems collection and each time its query results are updated it re-computes the summary information stored in the LiveViewSummaries collection.

third: The summary document in the LiveViewSummaries collection is basically a list with one abbreviated entry for each line item that would be in the live view, and some totals and averages at the end.

fourth: The update process that recomputes the categories that a live view contains (and then updates the corresponding document in the LiveViewSummaries collection) takes several seconds. In the case where a couple hundred line items are adjusted in such a way that they would all appear in a different live view, I want to take care that I only run the re-computation once. Currently I merely schedule the computation to start 3 seconds down the line, and then cancel that schedule if another re-computation request appears within 3 seconds.

fifth: When the user visits the dashboard, I subscribe to all the live view summaries.

sixth: When the user drills into a live view, I use the view definition to create a query into the LineItems table and I publish those line items to the client.

My Concerns

The one problem I am noticing with all of this is that my app start-up time is quite extensive, because all these observers re-calculate the live-view summaries each time they initially fire up. Also, I seem to have some deployment issues that make me wonder if the system is using too much memory. I’m also concerned that I’ve been too clever, somehow, and that I’m over-looking a more elegant approach.

crenshininbon · February 27, 2016, 7:27pm

I’m facing something similar: A dashboard view with a sunburst-chart where the user can configure what get’s actually shown. I have around 20.000 documents as possible sources for this chart. I did cut the load time by a factor of 10 by batch loading all data to the client (a method call sends the whole content as a JSON object to the clients). The client then takes the raw data and puts it into client side only MiniMongo collections.

I don’t need live data across clients though. But possibly there might be another step to make the client collections connect to the server collections after batch preloading to get client/server reactivity back.

Maybe you might be able to preload the views with line items when the data changes on the server side. And load the views on “view-level” to the client, not on line-item-level.

jononomo · February 29, 2016, 4:26pm

Thanks for the feedback, @crenshininbon

I have a related thread here: Why does my CPU usage never fall below 100%? Too many observers?

Your batch loading into mini-mongo sounds interesting, but doesn’t work for my use case, I think – I need the live updates from the server as the mongo DB is updated by some third party system.