Limit collection.find by field

vegard · December 14, 2015, 11:20am

I’m trying to limit a collection cursor to only return a set number of documents per value.

Example:

Collection:

  [
     { _id: 1, value: 1 },
     { _id: 2, value: 1 },
     { _id: 3, value: 2 },
     { _id: 4, value: 2 },
     { _id: 5, value: 3 }
  ]

Expected result:

  [
     { _id: 1, value: 1 },
     { _id: 3, value: 2 },
     { _id: 5, value: 3 }
  ]

Any idea how to do this easily?

thomasyajl36 · December 14, 2015, 11:43am

Take a look at the aggregration framework here: https://docs.mongodb.org/manual/core/aggregation-introduction/

looks like there’s a command db.collection.distinct(“field_name”) that returns an array of distinct values of field_name. Sorry I haven’t had much time to play with it. Maybe you can tweak some options to get it to do what you want.

Edit: Check this out too: https://docs.mongodb.org/manual/core/single-purpose-aggregation/
See the example of the group operation. “group” and “distinct” are both Single Purpose Operations to work with simple tasks, but you can use the rest of the aggregation framework to get more fine grained control.

vegard · December 17, 2015, 4:31pm

Thanks for the reply. This proved to be a much more complex approach than anticipated.

I’m trying to monitor a list of dates reactively, in order to display a list of available appointments in a calendar. Without the ability to limit to one document per date, the subscription could potentially become huge.

I dived into db.collection.aggregate(), but found out that it had a number of limitations.

It’s not officially supported. Meteorhacks provides a great package, but it doesn’t cover reactivity or db.collection.distinct().
Even if I was able to use a custom aggregation pipeline or distinct(), the output was only an array, and I couldn’t find a feasible approach to publish this as a reactive collection to the client.

The solution I landed on was to create a publication for a single date and limit to one. I’m using React as the template engine, calling the subscription like this:

Hour = React.createClass({
  mixins: [ReactMeteorData],
  getMeteorData() {

    // Thanks to Kadira.io:
    var data = {};
    var time = this.props.time;
    var handle = Meteor.subscribe('timeslot', time);
    if(handle.ready()) {
      data.timeslot = Timeslots.findOne({ time: time });
    }
    return data;
  },

Now this produces a large amount of subscription calls. Up to 800-1000, depending on the duration of the appointments and scale of the calendar.

Any reason to avoid this pattern?

shock · December 17, 2015, 4:45pm

We still dont see any example of actual document you are talking about, so hard to paste exact query.
That array in 1st entry is not document as it is in collection.
With actual data we can possibly suggest how to query or normalize collections.

Or you can still query it the ugly way in publication, process data in publication server side and send only related things to client. Something like the count low level publish example.

vegard · December 17, 2015, 5:08pm

Sorry, here are some more details.

Example document:

{
    _id: 1,
    time: "Thu Dev 17 2015 12:00:00 GMT+0100",
}

Publication:

Meteor.publish('timeslot', function (time) {
    return Timeslots.find({
        time: time
    }, {
        limit: 1,
        fields: {
            time: 1
        }
    }
});

However, my only question is: Are there any reasons not to call Meteor.subscribe tens/hundreds of times? Each subscription will only return 1 document with 1 field in any case.

shock · December 17, 2015, 5:18pm

than just provide start and end time and than use $gt and $lt for it.

vegard · December 17, 2015, 6:53pm

Ah, the point is there can be several documents with the same time (with different owners) so the publication for a period spanning just a week can yield several hundred documents (1 per user * 24 hours * 7 days). Limiting this with the standard collection options is also a no-go as I could end up with only publishing documents for a mon-tuesday.

Since I need to cover the whole week my only chance at limiting the amount of documents in the publication is to only publish a single document per hour/appointment. Ie. the distinct() result. My solution so far is to call a custom subscription: Meteor.subscribe('timeslot', time); once per appointment.

thomasyajl36 · December 17, 2015, 7:31pm

Too bad the aggregation framework didn’t work out.
As for your question, I don’t think it’s a pattern to have too many subscriptions like that, especially if each one returns a single document. But I could be wrong.

So it looks like you have too many Timeslots and it could be difficult to monitor. I’d not use a timeslot collection. Just an Appointments one. This would have a field for the date (doesn’t matter what time) and a string field with a predefined slot, like “9-10”. This is what it all would look like. (I haven’t used React yet so this uses Blaze):

2 templates( or components). One representing a day and one for the timeslots for that day:

<template name='day'>
   <h1>{{day}}<h1>
   {{> timeslots}}
</template>

<template name="timeslots">
    <h2>Here's today...</h2>
       <ul>        
          <li>8-9 is {{isAvailable "8-9"}}</li>
          <li>9-10 is {{isAvailable "9-10"}}</li>
          <li>10-11 is {{isAvailable "10-11"}}</li>   
       </ul>
</template>

Say the user wants to check what’s available for December 18, 2015.
First, you subscribe to the all Appointments with a date of December 18, 2015 at 00h00m00s. An Appointment document would look like this.

_id: 1d345hjk5sdf,
date: "Thu Dec 18 2015 00:00:00 GMT+0100", // Doesn't matter what time, just the date.
slot: "9-10" // The slot is a predefined string.

On the client you render the “day” template. Inside there’s the timeslots template. This is what the slot.js would look like.

Template.timeslots.helpers({
     isAvailable(slot) {
         // This gets the date from the parent template's context
         // You would have set this context when you render your "day" template(s)     
        const date = Template.parentData([1]).date;
        
        // slot is passed from the template and is used here to search
        Appointments.findOne({date: today, slot: slot}) ? "taken" : "available"

      }            
})

I hope this makes sense. So to sum up, you render multiple day templates, depending on the range users wanna see. The subscription is just one and it monitors all Appointments within the date range. Each day template will display its timeslots. The availability for each day is a combination of a day plus a timeslot string .

Let me know if you have questions or if I’ve gone crazy

vegard · December 17, 2015, 7:51pm

I’m actually using this exact approach.

The client renders all the “possible” slots without a collection. First, it gets an array of “days” from today and a week ahead in time. For each of those days the component renders an array of “hour” components and each of those components checks if there are any documents in the Timeslots (appointments) collection matching the hour.

The problem is - since multiple users can insert documents in the Timeslots (appointments) collection, there can be hundreds of documents in the collection in a given week. I don’t want to publish all of those documents - I just need to check if there is a single match per hour. So what I’m really trying to do is to optimize the publication, and I’m just not sure if calling multiple subscriptions is viable.

thomasyajl36 · December 17, 2015, 8:34pm

Well, if calling multiple subscriptions works fine for now, then I’d just leave it at that. I don’t know if there’s too much overhead for doing this. You’ll have to monitor when it’s in production.
Also, check to see how big the collection is if you had to publish say 1000 documents. It might be ok after all.