Reactive aggregations and mongo upgrade?

On this forum I am probably one of the most avidly not in favor in mongo, but recently, I discovered mongo 3.2 and the aggregation framework

I have found that this has completely worked out almost every frustration I have with mongoDB and it simply works great. With $lookup, $group, $unwind, and $match it’s really amazing what you can do.

This is probably the best example. Say you have Players. Players can belong to many games, and have many roles in that game. The document can be structured like this:

{
  _id: '1234',
  games: [{
    slug: "game-1",
    // ... more stuff here
  }],
  roles: [{
    'game-1': 'player'
  }]
}

It’s much quicker to get a big document containing all the data you need

db.players.aggregate([{
  // separate document for every game
  $unwind: '$games',
  // make object for each game and add a list of players
  $group: { 
    _id: '$games.slug',
    players: {
      $addToSet: '$_id'
    }
  },
  // perform lookup on all player profiles
  $lookup: {
    from: 'games',
    localField: '_id',
    foreignField: 'slug',
    as: 'game'
  }
}]);

And the result could come out looking like this:

[{
  game: {
    name: 'Game 1',
    slug: 'game-1'
  },
  players: ['1', '2', '3']
}, {
  game: {
    name: 'Game 2',
    slug: 'game-2'
  },
  players: ['4']
}]

Is there any plans for these two changes to be added? I find they’re incredibly powerful

2 Likes

Thanks for this @corvid. I believe I have some of these aggregations working server side on Meteor pre 1.2 which I think uses MongoDb 2.6 and they work great.

I’ll ask, does anyone know if there’s going to be a MongoDb update in the near future for Meteor?

You can already use the aggregation framework within Meteor. You can also get Meteor to connect to 3.0 mongodb servers.

The problem is with 3.2 where the current bundled driver does not support 3.2 and there is an open issue on github where pull requests are encouraged by @avital

The main improvement with 3.2 over 3.0 in this thread’s context is the lookup, which I believe can be achieved in other ways. But yes, the driver update is long overdue and it would be awesome if we could just use the latest goodies and syntax.

I’ve found you can use MyCollection.rawCollection().aggregate([]) on the server, but it does not appear to be reactive

It would be much easier if you used
meteorhacks:aggregate
but either way reactive aggregation requires you to set up a custom
publication where you poll for your result and send down the changes
or poll your aggregation pipeline and update a summary collection with
the result.

This is required because oplog does not provide means to tail for
aggregation pipelines. Furthermore, minimongo would have to support
that as well.

This is actually what I was curious about. Perhaps you can’t know whether or not an aggregation had data added or removed from it via the oplog, but couldn’t you infer what documents are being found and tail the oplog for those, and run the query again? Eg:

Meteor.users.rawCollection().aggregate([{
  $match: { 'name.first': 'Steve' },
  $lookup: {
    from: 'profiles',
    localField: 'profileId',
    foreignField: '_id',
    as: 'profile'
  }
}]);

A human would know that they would be watching for the users collection to change, and any documents with that $match condition. They would also know to look in the profiles collection for an _id that matches that documents profileId.

Probably over-simplifying, but couldn’t this basically translate to knowing what to tail the oplog for, then running the aggregation again / changing the in-memory contents of it?

Check out @glasser’s comments in this issue:

2 Likes

Nice find @robfallows and it seems oplog tailing can be disabled on a find on a query basis, but would still require using observeChanges.

Furthermore, according to the mongodb documentation:

If the collection specified by the $out operation already exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection.

the aggregation pipeline will have run only once so in practice, we still don’t get any reactivity because the pipeline will not rerun if its dependencies (underlying data) change.

So I guess, polling results, diffing and observing seem to be the only practical choice.

@corvid well it is not actually that simple because aggregation has lots and lots of operators that are really hard to implement in an incremental log like the oplog. Some of those operators change the document structure which makes it even harder. So it is not actually Meteor who’s to blame here, perhaps Mongodb itself needs some improvement there. What you are describing is actually something like a view in the traditional sense and I guess it is not on Mongodb’s roadmap.

1 Like

I guess you could use a predefined (in the sense that it’s a standard new MongoCollection(name) result collection and add a final stage after the aggregation pipeline to copy the documents from the MongoDB aggregation result collection to the predefined collection (which would be observable) - you’d need to do that via rawCollection.

Smells somewhat iffy, but should work. :grimacing:

What I wonder is what the effect of atomicity there would be. Does it empty the whole collection and insert the new pipeline result? If so, we can’t say that it is reactive. And it might even choke the server CPU

@corvid I’m the original poster of the issue on Github. I’ve been using aggregations for a while with meteor.

As @glasser says, the copy function mongo uses internally should not be much different from other operations. “We should improve the oplog driver to understand renameCollection, like it already does for dropCollection”

@robfallows @serkandurusoy As for atomicity, I don’t think it would be possible since the internals of Mongo seem to be dropping the collection. Can someone confirm if the Mongo’s oplog is dropping the collections on $out?

Anyhow would appreciate your +1 at the issue page to surface this issue to MDG.

1 Like

As for atomicity, I don’t think it would be possible since the internals of Mongo seem to be dropping the collection.

Yep, that’s what I was afraid going on. So $out is not means to achieve reactivity. We can always observe with or without this.

1 Like