Sort Collection by Array value

patrickwml · May 28, 2015, 9:54pm

I am trying to figure out how you would sort a collection by value where another value is == to something else see the document below

{
  "_id": "zLFp8KptxzACGtAZj",
  "createdAt": "2015-05-28T21:11:57.044Z",
...
  "profile": {
    "firstname": "Nelle",
    "lastname": "Carroll",
...
    "services": [
      {
        "serviceId": "Gy7uptfk8LNWMgEQy",
        "rate": 19
      },
      {
        "serviceId": "KvFETNLK8cJ78e6gy",
        "rate": 1
      },
      {
        "serviceId": "xwY532yxcWQ7qAjuP",
        "rate": 42
      }
    ],
}
...
}

Say I have a collection of about 10 users whose services contain the serviceId xwY532yxcWQ7qAjuP I could find them using Meteor.users.find({ 'profile.services.serviceId' : service }); but if I wanted to sort the users by the their rate using Meteor.users.find({ 'profile.services.serviceId' : service }, , { sort: { 'profile.services.rate' : 1 } }); It sorts the users by who has the highest rate in any service not just xwY532yxcWQ7qAjuP So how would I go about sorting by rate where serviceId === xwY532yxcWQ7qAjuP ?

robfallows · May 29, 2015, 10:51am

You may find this topic helpful.

Basically, MongoDB is not very good at handling embedded arrays - and minimongo further restricts the available functionality. However, there are a couple of things you can look at.

mspi · May 29, 2015, 2:35pm

Hi,

You use mongo’s Aggregate: http://docs.mongodb.org/manual/meta/aggregation-quick-reference/

I made a little test in mongo and this query gets all the records for one service Id order by rating

    db.test.aggregate([
        {
            $project: {
                'profile.services' : 1
            ,   'profile.firstname': 1
            }
        }
    ,
        {
            $unwind: '$profile.services'
        }
    ,   {
            $match: {
                'profile.services.serviceId' : 'xwY532yxcWQ7qAjuP'
            }
        
        }   
    ,   
        {
            $sort: 
            {
                'profile.services.rate': 1
            }
        }
    ])

In the first project, you see I only included profile.firstname so that I could distinguish the records. Just put all the fields in the project that you need to return. Remember that if you project first, only the listed fields are used in the pipe.
So the following unwind (converts array to separate records) only works if ‘profile.services’ are listed in the project above.

Aggregate is really awesome powerful. You can also group but don’t think you need that right now.

Hope this helped.

benjick · May 29, 2015, 2:38pm

From the docs

In this release, Minimongo has some limitations:

$pull in modifiers can only accept certain kinds of selectors.
findAndModify, aggregate functions, and map/reduce aren’t supported.

corvid · May 29, 2015, 2:56pm

tbh, if your collection is small enough you’re probably just better off using _.sortBy.

I think when you update collections you can still use $sort to maintain it in order, theoretically.

mspi · May 29, 2015, 11:34pm

guys, why would you want to do the Aggregate on the mini mongo side?
You do it in your publisher, and add an argument to pass the filter data if necessary to change the publisher results.

The you subscribe to this data and do your find in template helper.

The subscription can be reactive so you can pass new filter/arguments to it and update the subscription.

patrickwml · May 30, 2015, 11:28pm

Hey Guys thanks for all the great answers @mspi I need to look into aggregation more. Iv been wanting to get into it but I haven’t had the time to just sit down and learn it.

From a resource point of view do you think that aggregation is going to be too expensive to do if almost every user of the application will be seeing this data? @corvid brought up using _.sortBy which puts the cost on the client and not the server. Does anyone have any input on this?

mspi · May 31, 2015, 9:26am

I know that aggregation is really fast, but I to answer your question I would need insight in all the variables of the situation ( the whole setup).
You could also do a shared marriage: maybe use aggregate to make a publication and then you can probably filter further in the client?

streemo · June 1, 2015, 8:52am

@patrickwml @mspi

Good things to think about. I am in a similar situation. I spent time thinking of many solutions each with their own drawbacks and strong points.

TLDR:
I’m in a similar situation. I chose number 4 for my purposes, because it made the most sense with my data. I realized that option 2 is MUCH more nitpicky than it looks. And option one is trivial, but very unfeasible for a good UX.
1. Reactive aggregation, computing on client. You can aggregate the derivative set on the client and push them into a null collection called “AggregatedResultsCollection” or something. Good for small-ensured data.
2. Reactive aggregation on the server: You could do all of that in a publish function on the server, but there are a lot of subtleties that make it more difficult, namely paginating the aggregated results.
3. Non reactive mongodb aggregate: This is the easiest solution. All you do is delegate the job to mongo and return this.added for each result in your publish function.
4. Pseudo-reactive aggregation: Use a cache on the server to be smarter about when you push changes to the client.

If you’re gonna aggregate, keep in mind that you shouldn’t rerun a query unless the user’s parameters justify it. Throttle it.

My app needs to show a few “top” lists to all users. In order to get the top users, hashtags, etc. a reduce operation must be done on the data. For example Posts collection contains all the data necessary to determine the top hashtags: Unwind the post.tags array, group by tag, sum on a score variable, and we are done. Note that this can be done in pure javascript - we do not necessarily need aggregation. Depending on the data, this might make more sense.

Assumption: Your output data can be derived from a single collection. This will be true if you are considering Mongo’s aggregate function. We will call this “Posts”.
Variables:
a) How much reactivity do you want?
b) How much client-side computing do you want?

Solution 1: Reactivity, client side computing. Easy. If your data set is inherently small, this is probably the best solution. This is a bad solution for larger data sets. Very bad. Basically you publish all the data to a client-side collection, and then use an observer on the client to aggregate the data into a null collection. The reactive updates from the server on the Posts collection will transfer via observer into the null collection. Then, anywhere in your app, you can query the null collection for the aggregated data you need. You will have to put in effort in your observe cursor to make sure that the aggregated data in the null collection is updated correctly when an underlying Post object is added/changed/removed. This can be strong or weak reactivity depending on your implementation in the observe cursor. This is easy because you do not need to worry about limiting/pagination data from server to client, since you have very few documents.

Solution 2: Reactivity, server-side computing. Difficult.. For medium to large data sets. Similar to above situation, but observe the query on the server instead, and push the results of your math via added/changed/removed into a client-side collection. This moves all the computation from the client to the server. This becomes hard when you need to limit the results of the aggregation operation but you need the entire Posts collection to derive the aggregate data.

Solution 3: No reactivity, database computing. Easiest. For medium to large data sets. Enter the aggregate query. Run the query in a publish function, iterate over it and uses the added/changed/removed API to dump the results into a client-side collection. Implemented in the database layer, this takes the computation off of your server and your client. IIRC, you can spawn multiple secondary databases on other processors to horizontally scale when you have “too many” people subscribing to this query, since reads can be done from any db instance. Check out meteorhacks:cluster it looks pretty good. For aggregation, meteorhacks:aggregate. This solution is probably the most scalable.

Solution 4: Pseudo-reactive server/database computing. Difficult. Use a Cache on the server to keep track of what aggregation results you already sent to the client. This will prevent you from sending duplicate data, and it will let you diff results on the server to send only changes. This will waste less bandwith, but use more CPU. You can use the Meteor.onConnected callback to instantiate a Cache for each client.

The great thing about using publish/subscribe functionality for aggregate queries instead of direct method calls is that the behavior is undecipherable from live-queries on the client, so you can write the same code on the client as you would with a non-aggregate query…

I currently use Solution 3 on my app. If your data is high sub-rate, and follows the following condition. Let x be the parameters of the subscription. Let f be the query results of the publish function: If x - x’ ~ dx implies f(x) - f(x’) ~ df, df/dx both very small then you can implement a “smart throttle” which determines if we should run another query:

Example: Locational data. If you have a reactive subscription (in a computation/autorun) that depends on geoCoordinates (reactively), then we DO NOT want to re-run the aggregation on the server if the new location is very close to the old location because we can assume that the new results will be no different than the old ones. Again, it depends highly on your data.