Getting last document based on date

muybien · February 27, 2017, 9:42pm

Hi!

I need some a little bit of help. I have a collection like this:

{ "_id" : "1", "sessionID" : "1", "date" : ISODate("2017-02-01T09:34:45.746Z") }
{ "_id" : "2", "sessionID" : "2", "date" : ISODate("2017-02-11T09:34:45.746Z") }
{ "_id" : "3", "sessionID" : "3", "date" : ISODate("2017-01-01T09:34:45.746Z") }
{ "_id" : "4", "sessionID" : "2", "date" : ISODate("2017-03-01T09:34:45.746Z") }
{ "_id" : "5", "sessionID" : "1", "date" : ISODate("2017-03-01T09:34:45.746Z") }

How can I get the last documents of each sessionID? In my example I want to get the documents 3, 4,and 5.
I have this query working in Robomongo:

db.getCollection('sessionDates').aggregate(
    [
        { $sort: { sessionID: 1, date: 1 } },
        { $group:
            {
                _id: "$sessionID",
                lastSessionDate: { $last: "$date" }
            }
        }
     ]
)

But this query do not return all document fields and the aggragate is not (directly) available in Meteor. Is there a better way to do?

cloudspider · February 28, 2017, 8:26am

I would mark all documents that need to be fetched with a ‘publishedAt’ timestamp. All others will not have the field at all. It then simply becomes a matter of selecting:

Collection.find({ publishedAt: { $exists: true }});

muybien · February 28, 2017, 8:32am

Hi cloudspider and thanks for your tip!
Yes, I could add a field last: true when it’s the last and remove this field from the previous one…
If there is no alternatives I will do that…

robfallows · February 28, 2017, 9:45am

Using the aggregation pipeline is the right way to go.

To get all fields you have to explicitly tell the aggregation that you want each one using $project (if you’re using MongoDB 3.4 this has become easier with exclusions). For example:

db.getCollection('sessionDates').aggregate(
[
  { $sort: { sessionID: 1, date: 1 } },
  { $project: { someField: 1, someOtherField: 1, ... } },
  { $group:
    {
      _id: "$sessionID",
      lastSessionDate: { $last: "$date" }
    }
  }
]

See also this post:

muybien · February 28, 2017, 10:02am

Thanks robfallows!
If it’s the right way I would look for this. Thanks

Unfortunetaly I’m using MongoDB 3.2 but maybe I can migrate without any problems
If I can’t, I will have to make a second request with an $in arrayIDs,… not very sexy

Thanks!

robfallows · February 28, 2017, 10:19am

That’s fine - I had assumed that’s what you’re using. It’s just that 3.4 lets you simplify the $project if you want all fields. On 3.2 you need to explicitly select them all. You don’t need $in

cloudspider · February 28, 2017, 10:40pm

What do you mean with ‘the right way’. There are advantages of both options. Mine consumes less resources during read and is more complex during writes. Aggregations are more complex during read and take a bit more resources during reads.

My mongo database is actually acting as a read optimized denormized db meaning my writes take a bit more time, but reads are very fast. Thats also the number one reason why i love meteor. The user doesnt notice that writes take long because of latency compensation.

You can’t fake fast reads, but you can with writes. However, eventually the choice is up to you. KISS basically.

robfallows · March 2, 2017, 1:02pm

My apologies for not responding sooner - I’ve been ill. However, I think it’s important to answer you because your proposed solution is fundamentally dangerous.

There is already a date field, so why add another which is actually just a flag - it could be a 0/1 for example?

But that isn’t the issue I have with your proposed solution. What happens when new documents are added to the collection?

You then need to “unflag” the previous document and “flag” the new one. You might get away with trying to perform 2+ database operations atomically with a single instance of the Meteor server (even that’s not guaranteed). In a multi-instance server environment (several node processes on the same or multiple hosts) it will break sooner or later when more than one process tries to update the same document(s): then you could end up with no flags, multiple flags or the wrong flags depending on who got where quickest.

And what if documents are removed?

While it’s possible to simulate transactions in MongoDB with two-phase commit, it’s messy.

Any solution which adds unnecessary moving parts is a cause for concern both for implementation and ongoing maintenance. You propose a document field hack, together with a hard-to-code server hack to manage an atomic (transactional) operation.

Those changes are simply not needed. MongoDB already provides a solution for aggregation operations. It requires no changes to the document structure, no code hacks, no need to be concerned about running in a scale-out environment. The only thing @muybien should do for ultimate speed is to add a compound index { sessionID: 1, date: 1 }.

In addition, it’s really fast. Much faster than trying to do it in code.

Finally, it uses the database engine to do the hard work, which frees up your node thread(s) to service clients and do more important things.

Couldn’t agree more.

muybien · March 2, 2017, 3:07pm

Very detailed response @robfallows!
Thanks!

I’m triyng to use the aggregate function but I cannot manage to add the output fields for my query in mongodb 3.2 (without the $project command)… How do you specify, for mongodb 3.2, the output fields?

Looking in how to add an index with Meteor Thanks for the advice.

robfallows · March 2, 2017, 3:37pm

You do need to use $project and specify every field you want. There’s no alternative until Mongo 3.4:

https://docs.mongodb.com/v3.2/reference/operator/aggregation/project/

You can either do this in the Meteor Mongo shell:

db.getCollection.createIndex({ sessionID: 1, date: 1 }, { someOptions })

or in Meteor’s server startup:

Meteor.startup(() => {
  getCollection._ensureIndex({ sessionID: 1, date: 1 }, { someOptions });
});

where someOptions is documented here: https://docs.mongodb.com/v3.2/reference/method/db.collection.createIndex/#options

cloudspider · March 2, 2017, 10:57pm

“There is already a date field, so why add another which is actually just a flag - it could be a 0/1 for example”

True. I always use timestamps fpr flags. It tells me when the flag was added. It was deletedAt, publishedAt, updatedAt etc.

I do think that you overestimate the complexity and resource usage of my proposed solution a bit. (Sorry i love how the word ‘bit’ can have two meanings in computer related discussions ).

Dont get me wrong here i agree that the aggregation framework is pretty fast and yes it works, but for this specific issue you have the choice. I’ve been using my solution in collections with millions of records without any signicifant overhead.

For my own systems the docs usually have 3 fields. A topicId, a version number and a publishedAt timestamp. If someone posts a version with the same topucId at the same time as someone else it will throw a error telling the user that there is a conflict. In advanced scenarios i allow them to use a merging tool. Each addition is indeed a field removal from the old record and a field addition to the new record.

One major advantage is that this opens up a easy way for a developer to extend the functionality so that the user can rollback to older versions without actually removing the latest version or modifying the timestamp.

But yet again. It all depends on how the system currently works and what works for you

muybien · March 3, 2017, 8:30am

So, I think I could use your solution @cloudspider because the only script that will change the “last” column will be changed by only one script executed by a CRON job (but by the way I need a DB migration script to set the last boolean to existant data, so I need the aggragate method?? ^^).
The @robfallows solution use the database engine and I like that. But I have to update the mongodb instance of a already in prod application… And I have to declare somewhere that MongoDB 3.4 have to be used in order to develop…

So for the moment I am divided… I am exploring both solutions.

Thanks to both of you

robfallows · March 3, 2017, 9:55am

@cloudspider : thank you for adding some more clarity to your solution. As long as there is only ever going to be a single write (insert/update/delete) process (an out of band process, for example), then you will be safe from interleaved race errors. There is still an opportunity for inconsistent data if the MongoDB process or write process dies between database operations, but that is going to be rare with good design.

@muybien : you should use whichever method you are most comfortable with. At the end of the day only you understand your application and it’s you who’s going to write and maintain your code.

Good discussion guys - very useful

cloudspider · March 3, 2017, 8:32pm

Nice attitude