Why is mongo aggregation not reactive in meteor?

seanmavley · December 22, 2015, 7:43pm

On atmosphere at the moment?

JcBernack · December 23, 2015, 4:18am

I forgot to actually deploy the package to atmosphere because I have it installed locally. Should by online now.

seanmavley · December 23, 2015, 8:43am

Thanks. Figured it wasn’t there cos couldn’t install.

Now installed. Will try it later today, but I have high hopes it’ll work.

Something i still don’t understand trying meteor is how many of mongo
queries aren’t baked in. Django for instance has wrapper of PostgreSQL
allowing almost all possible PostgreSQL commands, plus and great drop down
to raw sql ability when the need be.

Realized meteor doesn’t even allow upsert?

Thanks again for the reactive aggregation.

ngochenbang · December 23, 2015, 10:47am

You can do upsert easily in meteor and by using rawCollection and rawDatabase, you can have access to the underlying mongodb driver too for all sorts of raw commands.

seanmavley · December 23, 2015, 8:29pm

Oooh, didn’t know that. That’s amazing! Thanks for pointing out.

Plus, I tried out https://github.com/JcBernack/meteor-reactive-aggregate and its such a cracker. Works seamlessly, and the question “Why mongo aggregation not reactive in meteor?” can be answered as, “Indeed, it is reactive!”

lpgeiger · January 5, 2016, 1:58pm

@timbrandin @ngochenbang
Perhaps you can contribute to the discussion here: https://github.com/meteor/meteor/issues/4947
That bug fix, would make aggregations reactive. Would be great to get your contributions toward a fix there.

willrbc · April 12, 2016, 11:52am

Glad I stumbled upon this. However, if I want to group by day of Year, as I understand it I need to construct the pipeline as follows. The returned aggregate returns _id as an object containing both day and year key, value pairs.

Meteor.publish("taskDays", function(){
   ReactiveAggregate(this, Tasks, [{
     $group: {
      _id: {
          dayOfYear: {
             $dayOfYear: "$updatedAt"
           },
           year: {
             $year: "$updatedAt"
           }
      },
      taskIds: {
          $addToSet: "$_id"
      }
    }
  }], {clientCollection: "TaskDays"});
 });

However when I do this I get the error
Exception from sub taskDays id ps6xceXjsX4ZbwEAZ Error: Meteor does not currently support objects other than ObjectID as ids

I’m very new to experimenting with $group in mongo, but if I change _id to something else and keep a single _id there it won’t accept the $dayOfYear

seanmavley · April 12, 2016, 12:09pm

I might be wrong, but isn’t the $updateAt referring to a Date object?

If so, thus the error message.

willrbc · April 12, 2016, 12:52pm

yes it does, but actually the _id object returned is something along the lines of:

 _id:{
          dayOfYear: 84,
          year: 2016
       }

The result is never a date, but it isn’t an ‘ObjectID’ either.

The problem is, I haven;t yet figured out how to group by days using anything other than the _id field.

If I put the $dayOfYear inside a differently named key that isn’t _id then i get:

MongoError: exception: unknown group operator 'dayOfYear'

It is my understanding that the _id in group is supposed to be the fields that you want to be unique to each result.

willrbc · April 12, 2016, 1:46pm

So I’ve solved my particular issue, but it had required a modification to @JcBernack’s package.

I’ve added an option to pass a ‘modifyAggregate’ function to the ReactiveAggregate function.

 {
clientCollection: "taskDays",
modifyAggregate: function (taskDay) {
  let day = moment()
    .year(taskDay._id.year);
  day.dayOfYear(taskDay._id.dayOfYear);
  taskDay.day = day.toDate();
  taskDay._id = taskDay.taskIds[0];
}

The function is simply run on each document transforming it into a collection more compatible with Meteor collections. I’ve also set the day field to a proper JS Date object so I don’t need to do anything on the client (apart from a simple format)

I decided to use the first of the ‘taskIds’ array as an _id, however this may prove troublesome and I am yet to test it with lots of task.

My whole reasoning for doing this is to group tasks by day so they can be display on a timeline like interface.

  collection.aggregate(pipeline).forEach(function (doc) {
  if (options.modifyAggregate){ // transform doc if modify function supplied
    options.modifyAggregate(doc);
  }
  if (!sub._ids[doc._id]) {
    sub.added(options.clientCollection, doc._id, doc);
  } else {
    sub.changed(options.clientCollection, doc._id, doc);
  }
  sub._ids[doc._id] = sub._iteration;
});
// remove documents not in the result anymore
_.forEach(sub._ids, function (v, k) {
  if (v != sub._iteration) {
    delete sub._ids[k];
    sub.removed(options.clientCollection, k);
  }
});
sub._iteration++;

}

lpgeiger · July 8, 2016, 6:34pm

There is another way of renaming the collection.

Firstly: Meteor has an issue with following changes in copied collections. Mongo’s aggregate $out uses collection copy internally. $out is importantant because it prevents any round trips from the DB to the application for processing. Also $out avoids the hard limit of 16MB of the aggregated object result.

For now this is not yet fixed in Meteor. Please upvote it here: https://github.com/meteor/meteor/issues/4947

The solution: Use a $project stage in your pipeline, after the group stage, to rename AND DROP the _id property. Then you can safely output using $out.

So my pipeline looks like this:

var pipeline = [
      {
        $project:  {
          productName: "$productName",
          price: "$price",
          units: "$units",
          customerTotal: {$multiply: ["$nits", "$price"]},
        }
      },
      {
        $group: {
          _id: {//_id is a fixed property for mongo aggregate, cannot be changed
            productName: "$productName",
          },
          customerTotal: {
            $sum:   "$customerTotal"
          },
          units: {
            $sum:   "$units"
          }
      },
      {
        $project:  {
          _id: 0, // DROP THE _id PROPERTY
          productName: "$_id.productName",
          units: "$units",
          customerTotal: "$customerTotal"
        }
      },
      {
        //outputs to collection. must be used for aggregations over 16MB. Meteor has a know
        $out : "aggregate"
      }
    ]

Run with:

var aggregateQuery = Meteor.wrapAsync(Apple.rawCollection().aggregate, Apple.rawCollection());
aggregateQuery(pipeline)

Now the _id is no longer a conflict with the Meteor ID conventions. The data should update reactively.

dvnwgmn · July 15, 2016, 9:23pm

Hey everyone - Thank you guys for running with this topic for a minute now. I’m a meteor noob, so this has been incredibly helpful. @JcBernack reactive-aggregate is working like a charm. It’s perfect for one round of aggregations (e.g., summing the value of a column). However, I would like to send the first round of results to a new server side collection using $out, run additional aggregations on that collection, and then send those results to the client. Here’s what I have so far and it’s not working:



Meteor.publish(“assetTotals”, function() {

ReactiveAggregate(this, Items, [

{$match: {itemType : “Asset”} // include only Assets from “Items” collection

},

{$group: {

’_id’: this.userId,

‘value’: {$sum: ‘$value’ }, // sum of all Assets

}

},

{$project: {

assetValue: ‘$value’

}

},

{$out: “metrics”} // write to new “Metrics” collection

]);

});

Meteor.publish(“liabTotals”, function() {
ReactiveAggregate(this, Items, [
{$match: {itemType : “Liability” } // include only Liabilities from “Items” collection
},
{$group: {
’_id’: this.userId,
‘value’: {$sum: ‘$value’} // sum of all Liabilities
}
},
{$project: {
liabValue: ‘$value’
}
},
{$out: “metrics”}
]);
});

Meteor.publish(“metricsResults”, function() {
ReactiveAggregate(this, Metrics, [
{$match: {} //include all docs in “Metrics” collection
},
{$group: {
’_id’: “liabValue”,
‘liabValue’: {$sum: ‘$liabValue’},
}
},
{$group: {
’_id’: “assetValue”,
‘assetValue’: {$sum: ‘$assetValue’},
}
},
{$project: {
liabValue: ‘$liabValue’,
assetValue: ‘$assetValue’,
netEquity: { $subtract: [ “$assetValue”, “$liabValue” ] } // here is the second agg using the first agg’s results
}
}], { clientCollection: “clientReport” }
); // Send the aggregation to the ‘clientReport’ collection available for client use
});

I’d imagine there are a number of things wrong here. Any help would be greatly appreciated. Thank you!

sportsdiehard · July 16, 2016, 1:56pm

Forget the aggregations…

When I started with Meteor I was thinking why the hell Mongo Aggregate is not built in Meteor, then I started to learn JavaScript and I dont have to care shit of Mongo aggreations.

I process all my collections with JavaScript and Underscore.js at client-side and everything remains reactive. First of all I store collection to variable var data = db.[collection].find().fetch(); and push the data array thru my functions and return the results to spacebars return myFunction(data);

I do a lot of “micro-writings” at server-side in my application and store “aggregated data” into nested documents. In the following example “root fields” are only _id, team_name, team_id, tournament_short_url, player_name, player_position. All other fields are “micro-writings” that are good for the nature of Node.js-environments.

 {
         "_id" : "wDBgSpkHTxzdJAksC",
         "team_name" : "Pittsburg Penguins",
         "team_id" : "NPPKqEK2ygfwa8Bk6",
         "tournament_short_url" : "nhl16",
         "player_name" : "Crosby, Sidney",
         "player_position" : "Center",
         "games" : [
                 {
                         "match_id" : "fNCGMW6HN8c43JPnb",
                         "division" : "Quarter-Finals",
                         "attending" : "In",
                         "status" : "Closed"
                 },
                 {
                         "match_id" : "LTeQKCfT6Brn26T8T",
                         "division" : "Division A",
                         "attending" : "In",
                         "status" : "Closed"
                 }
         ],
         "player_games" : 2,
         "goals" : 1,
         "assists" : 5,
         "total" : 6,
         "gpg" : "0.2",
         "apg" : "0.8",
         "ppg" : "1.0"
 }

tomtom87 · July 18, 2016, 11:53am

Fucking sweet idea, aggregations have been my biggest pain in the arse recently. Gonna try your method sounds like it makes way more sense

rhywden · July 18, 2016, 1:30pm

It’s called “denormalization” and is only a good idea when you don’t need the data in multiple places, i.e. you only have a A=>B relationship.

As soon as you have A<=>B or A=>B && A=>C you’ll have to think hard on how to structure your data. And also make damn sure you don’t run into synchronization issues.

sportsdiehard · July 18, 2016, 8:45pm

I do not see why it wouldnt, because we are working in JavaScript-environment and Underscore.js is a native part of Meteor. Looks like footsteps that MDG has shown to us to deal with aggregations etc. meanwhile.

I wouldnt say that is any more “data denormalization” than using Mongo aggregations, because you are basically using plain JavaScript or Underscore to gain same goals as you would do with full Mongo functionality support.

By using the client-side JavaScript data manipulations (aggregation-simulations) you will gain several benefits as you can move the stress from the server-side to the client-side, using the conditions to publish only the data that is required for “aggregations” and then pushing the data thru your own functions to transform the data into a form you want to represent it. I think everybody should take the advantage of the processing power that people’s mobile devices have today. Second major advantage is that you will learn to play with json, arrays and objects like a wizard and in the end of the day you can build up whatever report you boss ever ask you to do from all APIs, database connections or data/files. Just add D3 or some cool CSS flex-box ideas into a mix and there will be no data that you can’t transform into the fucking world class report.

Regarding to database structure, I have changed it many times when I found a better logic and better performance or I have added some functionalities that would require new logic. Each time I have turned into a new structure I have used JavaScript and Underscore to transform my data old into a new schema, so my apps can start to write new data immiatelly. I do not know how this will work with large collections, but I have managed to suppress collections and published documents into a half (or less) size.

@robfallows Sorry for inviting you in, but what do you think of this kind of approach?

rhywden · July 18, 2016, 9:32pm

Actually, no, you’re not moving the stress from server to client.

Because if you look closely, you’ll notice that unless you have very simple relationships you’ll have to do a whole lot of additional queries.

And “denormalization” is not a Mongo specific term. It’s a generic database term and usually you need to have very good reasons to denormalize because you end up with duplicate data and stale data if you’re not very careful.

There’s a reason for normalization. And it’s a pretty good one.

Also: You really can’t tell me that all your JSON magic can beat a single JOIN statement in clarity, ease of maintenance and reliability.

robfallows · July 19, 2016, 1:04pm

There are a lot of moving parts in this topic, which makes it difficult to present my thoughts in a coherent way, but if you’ll tolerate the rambling (and opinionation ) I’ll carry on.

My rule-of-thumb is always use the database engine: it’s what it’s designed for. There may be valid exceptions, but I follow that as closely as possible.

You’re already using it

Are you doing anything like find({ _id: someId }), or find({ createdAt: { $gt: someDate } })?

Then you are already using the database engine. If I were to offer a database in which you could only read all the rows that were stored and that you would have to write your own code to look at every row and decide if it was one you wanted, you wouldn’t want to use it.

So why wouldn’t you want the database to do as much as possible with your data? It’s almost always going to do it faster and more efficiently than you could code it - even in MongoDB and especially in SQL. Query analysis and optimisation is a very intense, mathematical study. I don’t want to try to get involved in it - and why should I - it’s been done for me by the database developers?

Resource use

So, you’re running through a few thousand documents from your MongoDB collection, recording players and keeping track of high scores and team averages … and doing it in Javascript and it’s working fine. More clients connect and do the same and performance suddenly collapses. Maybe you’re doing the leg work on the server - you’re now tying up the event loop while you perform computations in your single threaded node app, making clients wait in turn until they can tie it up for the next person in the queue. Maybe you’re doing it on the client and your Meteor app has set up observers and huge amounts of associated mergebox memory for each client and is pumping data over the wire… Well, you get the picture.

Or you get MongoDB to use the aggregation pipeline (yes, I know it’s not available in minimongo). You fire off a request to Mongo, the event loop continues, allowing the next client request to be serviced. MongoDB is multi-threaded, so it happily picks up these requests and services them in turn, all without impacting performance or availability of the node process. More importantly, the aggregation pipeline is fast. Your code might beat the overhead of setting up for a very small collection, but most of the time it’s a no-brainer.

Portability

You get bored with Javascript and decide to rewrite in Go. That’s a lot of bespoke code you’ve got to tackle. I’ll just take my queries and go (pun intended).

Complexity and normalising

It’s not so bad on a single MongoDB collection, but trying to join collections is a nightmare in code. We finally have the Mongo equivalent of LEFT OUTER JOIN in MongoDB 3.2, which will make the whole business a little easier - but only if we let the engine take the strain. SQL got this right - a good query is an elegant thing - and it’s much easier to change a query than to change code.

Apollo/GraphQL

Would you want to extend the paradigm to multiple, disparate data sources, or would you prefer to have Apollo/GraphQL do this for you?

Migrations

Migrations may be a valid case for using code. A migration takes a collection having one schema design and transforms it to one with a different schema. In SQL this would normally be done in SQL, but in Mongo it tends to be done in code - probably because Mongo and Javascript are closely intertwined.

Ad hoc data manipulation

This is another use case for client-side processing. A small set of documents which needs to by manipulated interactively according to the functionality in the UI. So, send the documents once and let the end-user play with them as they want.

Getting past the hurdle

I think there’s a reluctance to learn how to use the tools we’ve been given beyond the very basics. It’s daunting to consider that you’ll have to spend a significant amount of time learning how to make best use of Mongo, SQL, GraphQL or whatever. However, the rewards can make the difference between success and failure as your app grows its client base. Even super-basic stuff like using indexes properly makes a huge performance hike. Why would you not want to give your app the very best chance of success?

autoschematic · January 21, 2017, 8:50pm

@lpgeiger I’m sure by now you’re a pro - this was posted way back when. I thought it would benefit the community to mention how excellent a resource the tracker manual is to providing insight to how meteor’s reactivity works. If you still have any questions reading this documentation should clear everything up.

I recommend reading the entire document beginning to end - it builds on itself and it’s a very easy document to read. Yes, it’s long - but it’s really worth it in the end. There are a lot of really great javascript programming patterns one can learn from this document - and it completely explained to me why things are the way they are with reactivity.

Yes, this doesn’t explain why aggregation isn’t reactive - but it explains everything else after mongodb and how mongodb dovetails into meteor’s reactivity.

github.com

meteor/docs/blob/version-NEXT/long-form/tracker-manual.md

# Meteor Tracker

Meteor Tracker (originally "Meteor Deps") is an incredibly tiny (~1k) but incredibly powerful library for **transparent reactive programming** in JavaScript.

## Transparent Reactive Programming

A basic problem when writing software, especially application software, is monitoring some value – like a record in a database, the currently selected item in a table, the width of a window, or the current time – and updating something whenever that value changes.

There are several common patterns for expressing this idea in code, including:

* **Poll and diff.** Periodically (every second, for example), fetch the current value of the thing, see if it's changed, and if so perform the update.

* **Events.** The thing that can change emits an event when it changes. Another part of the program (often called a controller) arranges to listen for this event, and get the current value and perform the update when the event fires.

* **Bindings.** Values are represented by objects that implement some interface, like BindableValue. Then a 'bind' method is used to tie two BindableValues together so that when one value changes, the other is updated automatically. Sometimes as part of setting up the binding, a transformation function can be specified. For example, Foo could be bound to Bar with the transformation function of toUpperCase.

Another pattern, not yet as commonly used but very well suited for complex modern apps, is **reactive programming**. A reactive programming system works like a spreadsheet. The programmer says, "the contents of cell A3 should be equal to the sum of cells B1-B6, multiplied by the value of cell C4", and the spreadsheet is responsible for automatically modeling the flow of data between the cells, so that when cell B2 changes the value of cell A3 will be automatically updated.

So reactive programming is a **declarative** style of programming, in which the programmer says what should happen (A3 should be kept equal to the result of a certain computation over the other cells), not how it should happen, as in imperative programming.

This file has been truncated. show original

tomtom87 · February 7, 2017, 3:10am

I’ve been away for a while on other projects - is this now working as was suggested back then?