How to implement pagination with two different colletions?

float07 · May 25, 2023, 2:05pm

Hey there!

So, we currently need to implement a list that will have entries that will come from two different collections. This list needs to be paginated, so we don’t load everything at the same time, and it needs to be ordered by creation date. This mean we can have, for example, 3 entry from one collection followed by 2 entries from the other one, depending only on their creation data.

We currently have the list implemented following the paginating docs from Meteor: Publications and Data Loading | Meteor Guide, but now that we need to expand it to show data from two different collections we have no documentation to follow closely.
We have a few ideas on how to do this, but we wanted to know if there’s any solution already made for this, like a package or something like that.

Thank you for your attention

minhna · May 25, 2023, 5:16pm

You can make 2 subscriptions to load data from 2 publications for each collection.
I personally don’t like using pub/sub to load a list of data. It works but costly.

float07 · May 25, 2023, 5:30pm

Hey! Thanks for your answer!

Unfortunately, it’s not as simple as that. How would I know that I am loading things in the right order? I need a way to check the date of both collections at the same time.

Please let me know if my question or my explanation aren’t detailed enough

minhna · May 25, 2023, 5:49pm

You can pre-load more items than what you display.

mvogt22 · May 25, 2023, 8:34pm

For this type of data I would recommend using a Meteor Method that performs an aggregation with pagination using the $facet pipeline stage.

See here.

paulishca · May 26, 2023, 10:04am

I think you should not do it with pub/sub, instead use methods. Not only it is cheaper but you will not have to write code to prevent your list from reacting to new records.

If you don’t have time, just skip to the last paragraph.

Example:
If you use Twitter, you can see, as you scroll, you are being notified at the top that there are new posts but you only get those if you want … and start from the top again. What you need is basically what Twitter does
1. Paginate (the number of Twitter cards you get on your screen).
2. Insert ordered posts from another Collection once they meet criteria based on the first collection (think of it as Twitter ads insertion).
3. Notify if the list has changed (new posts can only be … newer ).

This is a typical case for algorithms in the social network where ads are not posts but they are in the same data pipeline.
How I’d do it is to make use of find({}, { limit: 1 }). Find only queries, it doesn’t retrieve data and count your 1 result.
Get your “main batch”, get your dates range from the first batch and use that range to query the second collection. If you find at least 1 (limit 1), do a fetch() for the entire data range, from the second collection.
Make an array of both of them sorted by date and send to client. Or send to client and let the client run the sorting with a sort function.
Next queries you send to server should keep track of limit/skip and date range.

I think you cannot have the same number of records on every data pull from the client unless you do more queries on the server and might reach a race situation between the 2 Collections. Number of records is not important when you scroll on a wall and most of the cards ar hidden but it might be important when you look at tables with … “Previous 20 - Next 20”.

Race situations: you need exactly 20 records. Get 20 from the first Collection, see if any on the second collection. Merge results in an array sorted by date and slice away the extras … Oh wait, didn’t have 20 on the first collection but I have 10 on the second collection, maybe increase the limit now and pull again until I reach my desired 20 records to send to client.

There is a muuuuch easier way … have another collection where you can just write … the writing from the other 2 collections. A simple indexed collection that only contains source (which collection), id and date. Then you can easily use the 3 collection to aggregate data of 2 based on 1 or just run simple queries with { _id: { $in: [ ids from the range of your third collection ] } }

superfail · May 26, 2023, 1:30pm

@float07 If you really want to go the pub/sub route with a [begin, end) range, subscribe to both collections with the relevant sort, skip = 0 and limit = end, then merge and filter according [begin, end) and sort. Second-stage filtering can be done either client-side or server-side.

An option to avoid merging when querying is to merge the collections. You still need to use skip = 0 when post-filtering on the client (post-minimongo), and post-filter from begin, but at least you do not need to merge.

What @paulishca described for staleness notification can be implemented via custom added/removed/changed observers: instead of getting data updates, invalidate the subscription when data goes stale and allow user to trigger a local refresh (via a UI button for instance).

Note that realistically, you must restrict how much data will flow through a single subscription, so you should set a page limit with this technique, and resort to methods or more complex publications that do not dump an entire index prefix if you want your users to reach “deep” pages.

If you merge the collections into one, and do not syndicate subscriptions (for instance by using a method, as @mvogt22 suggested), you can use skip = begin on first filter and have a dedicated subscription for staleness notifications (but avoiding race conditions requires a bit more work).

float07 · May 29, 2023, 12:35pm

Thanks everyone for the very detailed explanations!

Thinking about it, we now see that we don’t really need the list to be reactive. With that said, @mvogt22 solution seems to be the best one for us, although we would still need to do a little research on MongoDB’s aggregate, which is a relatively complex subject. But it really looks like the solution we will implement.

That said, @paulishca’s and @superfail’s explanations were really detailed, and looks like they would work if we needed a more complex and reactive list!

Thank you all for your time and very well made explanations! Have a great one

vooteles · May 29, 2023, 1:40pm

Just as an aside, an interesting package to take a look at is this one, maintained by a long time forums user Rob Fallows : GitHub - robfallows/tunguska-reactive-aggregate: Reworks jcbernack:reactive-aggregate for ES6/ES7 and Promises

Aggregation does not always have to exclude reactivity.

cloudiy · May 30, 2023, 7:10am

But how would data gets updated live? when using subscription with useTracker i get live data

how would i do same with methods