Subscription bombs out after >50k documents

until you check database load from active cursor with geo -> forced polling :smiley:

1 Like

Maybe someone has experience with Meteor + protobuf.jsā€¦

1 Like

Hi,
I see that it is possible to do server side geospatial clustering using geohashing (https://ravis.me/2015/05/29/server-side-geo-clustering-with-mongodb/). The main mongo distribution has a function ā€˜aggregateā€™ with which you can cluster data. So this looks like quite an elegant approach.

I see that the meteor documentation does not have an aggregate function for a collection :frowning: Is this a dead-end or is there an indirect way to make an aggregate query on the mongo dataset (eg spawn a bash script)?

1 Like

But we have aggregateā€¦

1 Like

there are few packages - for both aggregate and also reduce
and yes, meteorhacks (kadira) packages are stable standard

Maybe something like GroundDB will be a good solution?

I have found that the websocket canā€™t seem to handle that much data being pushed down. Iā€™ve found that once you hit a certain size of data being pushed across the websocket, the websocket connection resets and resubscribes to the publication over and over again, which essentially looks like 0 documents on the client most of the time. Go look in the network tab of the chrome dev tools and watch the websockets to see if you are making a ton of reconnections.

1 Like

Use an aggregation query to make it fasterā€¦

1 Like

Hi,
Is there something on my side that I can do to make the websocket connection more robust or is this more a meteor core issue?

I have managed to do server side geospatial clustering. Its still got some rough edges, but the principle works quite slick.

No, from my testing, I think it is either an issue with the underlying websocket lib or with the browsers themselves as it happens across multiple devices and browsers.

FYI,
Collection.insert takes 94s for 50k points, while the clustering function takes only 600ms to cluster 500k points.

Building a 500k dataset simply takes ages. I wouldnt mind getting a tip on how to build a big test dataset faster.

http://info.meteor.com/blog/inserting-50000-documents-into-a-collection-slow-fast-and-fastest

1 Like

https://atmospherejs.com/mikowals/batch-insert could help you build your 500k dataset more efficiently.

Share results to us?

It is not so simple to send to client and handle more than 10k docs.

  1. Remodel your collection to use small primitive types
  2. Choose a high compress binary serialization protocol that expect a model schema, like Googlesā€™s Protocol Buffers or Apache Avro (maybe this alternative from NYTimes too: http://nytimes.github.io/tamper)
  3. After data serialization, cache it in-memory (Redis recommended) for future requests. (encode it in Base64 for even smaller gzip size)
  4. Make a Meteor cluster to handle server data processing and serialization in parallel
  5. Split the client request (Meteor Method or HTTP endpoint ā€“ because client DDP has a bug handling many messages at same time) in parts of 1Mb of data (~3k docs, depending on the doc size), send each request to a different node in cluster
  6. Decode / deserialize data in client
  7. To handle data filter in client, use a much faster than Minimongo in-memory JS database that can persist data in IndexedDb/Localstorage too and have indexes/dynamic views that are good for data aggregation: LokiJS
  8. If you want to sync the data and keep it up to date, use low level puclications that send updated data to a client only collection that will be used only as a message broker to update LokiJS. Invalidate or update the server caches too. This way you can handle the sync yourself and on a second visit the client already have the data.

-* If you want to use Minimingo, insert data in batch bypassing its sync (it has a bug for many inserts) with col._collection.insert

2 Likes

Thanks Fabio,
Managing large data looks like daunting, but it has to be done. The future is big data, with IoT etc coming.

FYI, I created 500k documents in 100-140s using the batch-insert package. Again on the server I can see 500k documents with .find().count(). However my client simply says ā€œwaiting for localhostā€/ā€œProblem loading pageā€ and then hangs. Memory wise I am using 75% (2.9GiB) on a Linux virtual machine.

Hi Fabio,
Can LokiJS co-exist with meteor/mongo?
If you have built a meteor app can you plug Lokijs into it or do you need to completely redo your app?

I dont think hat I will every expect from Cloud applications to send to my client all data saved in cloud at once :smiley:

BTW what should client show you - helper is returning .find().count() ? or what?

Hi,
Bear with me I am on the learning curve of what can be done and how to do it the most efficiently. Im a civil engineer and it is interesting that even desktop applications (Autodesk Civil3D) can be horrendously slow when it comes to spatial data processing (a single core program in todays age). I am just testing the boundaries of what can be done. On the other hand I know infrastructure owners want to see all of their infrastructure in one go. It is up to the developer to figure out ways how to compromise and deliver.

I dont think you would ever need to show 500k data points.
You have limited space to show on map, so you always limit returning data to just these shown in map - depending on zoom there can be various radius values from map centre.
I still think that more than 1000 map markers on fullscreen map would be horrible mess.