Subscription bombs out after >50k documents

shock · December 7, 2015, 5:24pm

until you check database load from active cursor with geo -> forced polling

none · December 8, 2015, 7:28am

Maybe someone has experience with Meteor + protobuf.js…

ggerber · December 8, 2015, 7:41am

Hi,
I see that it is possible to do server side geospatial clustering using geohashing (https://ravis.me/2015/05/29/server-side-geo-clustering-with-mongodb/). The main mongo distribution has a function ‘aggregate’ with which you can cluster data. So this looks like quite an elegant approach.

I see that the meteor documentation does not have an aggregate function for a collection Is this a dead-end or is there an indirect way to make an aggregate query on the mongo dataset (eg spawn a bash script)?

none · December 8, 2015, 7:44am

But we have aggregate…

shock · December 8, 2015, 8:24am

there are few packages - for both aggregate and also reduce
and yes, meteorhacks (kadira) packages are stable standard

none · December 8, 2015, 2:30pm

Maybe something like GroundDB will be a good solution?

joshowens · December 8, 2015, 2:40pm

I have found that the websocket can’t seem to handle that much data being pushed down. I’ve found that once you hit a certain size of data being pushed across the websocket, the websocket connection resets and resubscribes to the publication over and over again, which essentially looks like 0 documents on the client most of the time. Go look in the network tab of the chrome dev tools and watch the websockets to see if you are making a ton of reconnections.

joshowens · December 8, 2015, 2:41pm

Use an aggregation query to make it faster…

ggerber · December 8, 2015, 3:04pm

Hi,
Is there something on my side that I can do to make the websocket connection more robust or is this more a meteor core issue?

I have managed to do server side geospatial clustering. Its still got some rough edges, but the principle works quite slick.

joshowens · December 8, 2015, 3:05pm

No, from my testing, I think it is either an issue with the underlying websocket lib or with the browsers themselves as it happens across multiple devices and browsers.

ggerber · December 8, 2015, 3:09pm

FYI,
Collection.insert takes 94s for 50k points, while the clustering function takes only 600ms to cluster 500k points.

Building a 500k dataset simply takes ages. I wouldnt mind getting a tip on how to build a big test dataset faster.

joshowens · December 8, 2015, 3:12pm

http://info.meteor.com/blog/inserting-50000-documents-into-a-collection-slow-fast-and-fastest

serkandurusoy · December 8, 2015, 3:26pm

https://atmospherejs.com/mikowals/batch-insert could help you build your 500k dataset more efficiently.

none · December 8, 2015, 3:28pm

Share results to us?

fabiodr · December 8, 2015, 6:52pm

It is not so simple to send to client and handle more than 10k docs.

Remodel your collection to use small primitive types
Choose a high compress binary serialization protocol that expect a model schema, like Googles’s Protocol Buffers or Apache Avro (maybe this alternative from NYTimes too: http://nytimes.github.io/tamper)
After data serialization, cache it in-memory (Redis recommended) for future requests. (encode it in Base64 for even smaller gzip size)
Make a Meteor cluster to handle server data processing and serialization in parallel
Split the client request (Meteor Method or HTTP endpoint – because client DDP has a bug handling many messages at same time) in parts of 1Mb of data (~3k docs, depending on the doc size), send each request to a different node in cluster
Decode / deserialize data in client
To handle data filter in client, use a much faster than Minimongo in-memory JS database that can persist data in IndexedDb/Localstorage too and have indexes/dynamic views that are good for data aggregation: LokiJS
If you want to sync the data and keep it up to date, use low level puclications that send updated data to a client only collection that will be used only as a message broker to update LokiJS. Invalidate or update the server caches too. This way you can handle the sync yourself and on a second visit the client already have the data.

-* If you want to use Minimingo, insert data in batch bypassing its sync (it has a bug for many inserts) with col._collection.insert

ggerber · December 8, 2015, 7:02pm

Thanks Fabio,
Managing large data looks like daunting, but it has to be done. The future is big data, with IoT etc coming.

FYI, I created 500k documents in 100-140s using the batch-insert package. Again on the server I can see 500k documents with .find().count(). However my client simply says “waiting for localhost”/“Problem loading page” and then hangs. Memory wise I am using 75% (2.9GiB) on a Linux virtual machine.

ggerber · December 8, 2015, 7:07pm

Hi Fabio,
Can LokiJS co-exist with meteor/mongo?
If you have built a meteor app can you plug Lokijs into it or do you need to completely redo your app?

shock · December 8, 2015, 7:07pm

I dont think hat I will every expect from Cloud applications to send to my client all data saved in cloud at once

BTW what should client show you - helper is returning .find().count() ? or what?

ggerber · December 8, 2015, 7:14pm

Hi,
Bear with me I am on the learning curve of what can be done and how to do it the most efficiently. Im a civil engineer and it is interesting that even desktop applications (Autodesk Civil3D) can be horrendously slow when it comes to spatial data processing (a single core program in todays age). I am just testing the boundaries of what can be done. On the other hand I know infrastructure owners want to see all of their infrastructure in one go. It is up to the developer to figure out ways how to compromise and deliver.

shock · December 8, 2015, 7:19pm

I dont think you would ever need to show 500k data points.
You have limited space to show on map, so you always limit returning data to just these shown in map - depending on zoom there can be various radius values from map centre.
I still think that more than 1000 map markers on fullscreen map would be horrible mess.