@patrickwml @mspi
Good things to think about. I am in a similar situation. I spent time thinking of many solutions each with their own drawbacks and strong points.
TLDR:
Iâm in a similar situation. I chose number 4 for my purposes, because it made the most sense with my data. I realized that option 2 is MUCH more nitpicky than it looks. And option one is trivial, but very unfeasible for a good UX.
1. Reactive aggregation, computing on client. You can aggregate the derivative set on the client and push them into a null collection called âAggregatedResultsCollectionâ or something. Good for small-ensured data.
2. Reactive aggregation on the server: You could do all of that in a publish function on the server, but there are a lot of subtleties that make it more difficult, namely paginating the aggregated results.
3. Non reactive mongodb aggregate: This is the easiest solution. All you do is delegate the job to mongo and return this.added for each result in your publish function.
4. Pseudo-reactive aggregation: Use a cache on the server to be smarter about when you push changes to the client.
If youâre gonna aggregate, keep in mind that you shouldnât rerun a query unless the userâs parameters justify it. Throttle it.
My app needs to show a few âtopâ lists to all users. In order to get the top users, hashtags, etc. a reduce operation must be done on the data. For example Posts collection contains all the data necessary to determine the top hashtags: Unwind the post.tags array, group by tag, sum on a score variable, and we are done. Note that this can be done in pure javascript - we do not necessarily need aggregation. Depending on the data, this might make more sense.
Assumption: Your output data can be derived from a single collection. This will be true if you are considering Mongoâs aggregate function. We will call this âPostsâ.
Variables:
a) How much reactivity do you want?
b) How much client-side computing do you want?
Solution 1: Reactivity, client side computing. Easy. If your data set is inherently small, this is probably the best solution. This is a bad solution for larger data sets. Very bad. Basically you publish all the data to a client-side collection, and then use an observer on the client to aggregate the data into a null collection. The reactive updates from the server on the Posts collection will transfer via observer into the null collection. Then, anywhere in your app, you can query the null collection for the aggregated data you need. You will have to put in effort in your observe cursor to make sure that the aggregated data in the null collection is updated correctly when an underlying Post object is added/changed/removed. This can be strong or weak reactivity depending on your implementation in the observe cursor. This is easy because you do not need to worry about limiting/pagination data from server to client, since you have very few documents.
Solution 2: Reactivity, server-side computing. Difficult.. For medium to large data sets. Similar to above situation, but observe the query on the server instead, and push the results of your math via added/changed/removed into a client-side collection. This moves all the computation from the client to the server. This becomes hard when you need to limit the results of the aggregation operation but you need the entire Posts collection to derive the aggregate data.
Solution 3: No reactivity, database computing. Easiest. For medium to large data sets. Enter the aggregate query. Run the query in a publish function, iterate over it and uses the added/changed/removed API to dump the results into a client-side collection. Implemented in the database layer, this takes the computation off of your server and your client. IIRC, you can spawn multiple secondary databases on other processors to horizontally scale when you have âtoo manyâ people subscribing to this query, since reads can be done from any db instance. Check out meteorhacks:cluster
it looks pretty good. For aggregation, meteorhacks:aggregate
. This solution is probably the most scalable.
Solution 4: Pseudo-reactive server/database computing. Difficult. Use a Cache on the server to keep track of what aggregation results you already sent to the client. This will prevent you from sending duplicate data, and it will let you diff results on the server to send only changes. This will waste less bandwith, but use more CPU. You can use the Meteor.onConnected callback to instantiate a Cache for each client.
The great thing about using publish/subscribe functionality for aggregate queries instead of direct method calls is that the behavior is undecipherable from live-queries on the client, so you can write the same code on the client as you would with a non-aggregate queryâŚ
I currently use Solution 3 on my app. If your data is high sub-rate, and follows the following condition. Let x be the parameters of the subscription. Let f be the query results of the publish function: If x - xâ ~ dx implies f(x) - f(xâ) ~ df, df/dx both very small then you can implement a âsmart throttleâ which determines if we should run another query:
Example: Locational data. If you have a reactive subscription (in a computation/autorun) that depends on geoCoordinates (reactively), then we DO NOT want to re-run the aggregation on the server if the new location is very close to the old location because we can assume that the new results will be no different than the old ones. Again, it depends highly on your data.