My struggles making a fair amount of data available offline with Meteor

tdamsma · February 2, 2016, 11:42am

I am working on a logistical meteor app that has provided some challenges. I went through a few time consuming iterations, so I wanted to share them to perhaps save others some time, or maybe receive some feedback on my line of thought.

The app tracks up to 10.000 items, and each item has a few properties e.g. location where is was last seen. A user must be able to:

Change properties of any given object (identified by a QR code)
Query where each item is
Query where any given item has been in the past

These three requirements are pretty straightforward, but the fourth caused quite a few headaches:

1 and 2 should be available offline, and synced when online

To support this, all user input is stored in one collection following the Event Sourcing data model: each add/remove/modify of an item is all stored as a separate event.

Take one:

I first considered a two collection approach: one events collection for history (user input) and one (derived) collection for current state of all items. Each user input causes an insert in the events collection and an update in the current state collection. Publish the current state to the client, save it with GroundDB, and presto… Or not. Turns out there are several problems with this setup:

Subscribing to 10K documents is very slow
Minimongo doesn’t really support 10K documents (it fits, but insertion takes up to a minute)
GroundDB uses localstorage, which seems to be slow, and has a has a default limit of only 5MB

Not sure what each factor contributed, but the result was just unusable slow on an i5/16GB desktop with an p to date Chrome.

Take two:

Define a server method that dumps the entire Current Sate collection as a json document
Publish a query to the events collection for the latest insert time, and subscribe to that
Re-run the method every time the latest insert time changes.
Store the data offline using Localforage

This worked, but there were a still a few problems:

Because users can add information while offline (that is synced when online), sometimes events do not arrive at the server in order. This could be solved by only using the server time for sorting the events, but unfortunately this does not fit the requirements. This means that when a new event is added after a later one is added, the new state cannot be derived from the current state, but has to be derived using the entire history. Luckily this can be easily done with MongoDB pipelines, in fact this runs fast enough (< 500ms for the entire Events collection) that I completely ditched the current state collection; it is derived at query time.
Every change in the database require resending about 10MB json (500kb compressed) to each client, not really efficient (The app is to be used on sights with limited internet connectivity (both limited bandwidth and limited availability)

Take three:

Finally, I reworked the MongoDB aggregation query to group the results in a new grouped state collection of max 100 documents, where each document contains the state of 100 items. After every insertion the query is rerun to derive the new state, and uning Meteor the result of this query is used to upsert the grouped current state collection
On the client side, one subscribes to this grouped current state collection. It has at max 100 documents, and this works fine, even on low bandwidth connections. Apparently the max for number of subscribed documents is somewhere between 100 and 10.000.
The current state is derived from this collection for each of the 10k documents, and stored both in Localforage and a reactiveVar. This function is reactive, so it reruns when the collection changes
Use lodash (instead of minimongo) on this list of 10K elements to make queries for the UI
Every insertion in MongoDB results in an update of one document, so only 1% of the data is resent.

Currently I am happy with the latest iteration, but I was a bit disappointed I couldn’t stick closer to the Meteor way for pulling this off. 10.000 is more than a few, but definitely not huge.

tomRedox · February 2, 2016, 11:52am

That’s very interesting, thanks for sharing your findings.

I’m sure you’ve already seen it, but there’s been more discussion of offline working here over the last few days.

tdamsma · February 2, 2016, 2:35pm

Yeah, I noticed. It would be great if we get couchbase style full database synchronization (with conflict resolution) to the client someday. There are really much prettier solutions to this problem out there than what Meteor currently provides, and roll your own for such a core technique just doen’t feel right. Hopefully MDG can sort this out, but in the meantime we’ll have to keep on struggling.

vikti · February 20, 2016, 2:48pm

@tdamsma This is very interesting. I am developing the same app as you… but for a game (simply think that my monsters are your items )

I am taking the way of your first iteration and wondering about the minimongo or localstorage size (I will use cordova) for offline use.

I see you ended with LocalForage and, as you said, could not stick on the Meteor way. And here is my question: if you have to redo this, would you use Meteor (with 1.3 in mind) or building you own stack?

vonwao · February 20, 2016, 3:16pm

Seems like with 1.3, since you can NPM, you could use PouchDB. I think theoretically you could use it on the server too, or probably better use a Cloudant DB.

Of course at that point I guess you’re really not using much of Meteor, but I suppose you could use Meteor for some of your data, and pouch for the data you need synched.

vikti · February 20, 2016, 3:25pm

This is where I stuck. Since there is no improvements in 1.3 for persistency and offline use I wonder if Meteor is the good choice for me.

tdamsma · February 21, 2016, 2:58pm

Even though it is not pretty, and I believe PouchDB is much better suited for persistent offline data with synchronization, I think I would still use Meteor today if I had to start over.

The strength of Meteor for me is that it is a relatively established way of doing stuff in the rapidly changing JavaScript landscape. Probably one can build a mobile website, app and backend by picking all the best NPM modules, but I really wouldn’t know where to start. Meteor and Cordova is maybe not the sexiest way to do this, but it is one that works without too much hassle. And when I did run into a weird Cordava bug, Martijn Walraven fixed it for me within a few hours.

vikti · February 22, 2016, 9:59pm

I think you are right. I am comparing custom stacks vs Meteor for two days now and only CouchDB or GroundDB (the next version will use SQLite) could fit my requirements. Fortunately Meteor handle CouchDB with livequery so I think I will manage to keep on with Meteor and avoid boilerplates indigestion…

tdamsma · February 23, 2016, 6:57am

It seems that CouchDB with livequery ackage is only a back end replacement, so you have CouchDB instead of Mongo. That alone would not solve offline persistence, as minimongo is still used on the client side.

On the client (and on the server if you specify a connection), Meteor’s Minimongo is reused i.e. Minimongo instance is created. Queries (find) on these databases are served directly out of this cache, without talking to the server.

What I would really like is something more like PouchDB; full two way synchronization of (a subset of) the database to the client.

GroundDB relies on MiniMongo, and MiniMongo doesn’t handle a largish (1000+ documents) collection very well. So If you have a collection (like I have), you have to think of something else

vikti · February 23, 2016, 7:18am

Have you an idea of why Minimongo does not handle well more than 1000 docs? Is this because it is an in memory solution?
In this case, if GroundDB relies on SQLite (it seems to be the case for the next release), should there be a difference?

none · February 23, 2016, 7:30am

Somewhere I can read how to implement the items in your third option?
Thank you.

tdamsma · February 23, 2016, 9:30am

No, didn’t look into why. I just noticed a subscription to 10k documents takes for ever and gave up.

tdamsma · February 23, 2016, 9:59am

No, not really. Let me try to explain a bit better:
I have two Meteor collections: Items and GroupdItems.

What I did was group Items by last two digits of id (I have a numeric id) with a Mongo aggregate query (jcbernack:reactive-aggregate). Within a server method, I upsert the result of this query to another the GroupdItems. By using Meteor for this, I get the reactivity on the GroupdItems publication.

Meteor.methods
  updateGroupedItems: () ->
    console.log 'updateGroupedItems'

    data = Items.aggregate [
      { $sort:
       _dd: 1}
      { $match: deleted: false }
      { $project:
        groupId: $first: $substr: ['$_id', 8, -1]
        details: 1
        contents: 1
       }
      { $group:
       _id: '$groupId'
       ids: $addToSet: '$_id'
       details: $addToSet: '$details'
       contents: $addToSet: '$contents'}
      { $sort:
       _id: 1 }
    ]

    console.log 'Removed extra items: ' + GroupedItems.remove(_id: $not: $in: data.map (d) -> d._id)
    data.map (d) -> GroupedItems.upsert({_id:d._id},d)

The client is subscribed to GroupdItems, and on every change, it repopulates a plain list that contains the ungrouped content of GroupdItems, so it is almost identical to Items. Except that Items is a Meteor.Collection on the server, and a list put into a ReactiveVar on the client.

After each changein Items, i rerun the method that updates GroupedItems. This updates the publication, that updates the data in the client, wchich triggers an update of the ReactiveVar and presto: reactivity!

It sounds hacky and it is but at least it works for me. If anyone knows a better way please let me know

none · February 23, 2016, 10:40am

It sounds very hacky (-:
Looks like I understood, but maybe I understood nothing (-:
OK, for example, I have 1000 items. I can subscribe to all this items, they are small, but looks like it is not very efficient.
Can be more effectively grouped into 10 or 100 documents, and then on the client ungroup somehow, because I need every document separately. But I do not know if this is worth doing, and how.

tdamsma · February 23, 2016, 6:21pm

Yes,you understood it correctly: group on server, subscribe and ungroup in client.

If you can subscribe to all items and it works, I would say do it the proper way. Only if your performance becomes terrible or if it doesn’t work at all, then start thinking about ugly hacks. This one made my app work, without it it didn’t. So in this case it is totally worth the effort.

manrashid · February 23, 2016, 7:05pm

Realm React Native for offline data might work.