Working with very large documents in mongo. Caching? CDN?


#1

We are working with quite large documents in our collection (many 3d objects to be rendered on client to create bigger 3d scene), and so far we have encountered number of problems due to that. And looks like that surprisingly small number of people have had similar issues, so maybe our usage pattern is not so common (or maybe it is anti-pattern).

We have had to fight with iron router and replace it with flow router, due to unpredictable data reloads, which are very heavy when each set is many megabytes long. We have had to move away from Astronomy, as it turns out that for save() command it does first find() and then update(). And find+update versus just update for a document that is 10 megabytes long makes a huge difference. And other cases, that probably do not get noticed that often when you work with rather small subset of collection, where each document is only kilobytes long.

Currently it feels that we have reached the limit of what can be optimised, and next step would be to try and employ server side caching and probably CDN. But I am not sure if that is at all possible, as all those mongo documents are after all mongo documents and thus transmitted via DDP.

Each of the documents is by itself quite static (only sometimes edited by in the backend, but not changed by the end user application), but they are are all arranged dynamically and loaded into scene whenever they become available reactively. It works quite neatly, and feels ‘meteor way’. But loading ~20 objects each several megabyte still takes 15 seconds with no obvious way to optimize the loading time.

Do you have advice on proxy/CDN and on overall pattern that we could employ to improve the situation?


#2

You probably just won’t be able to store this stuff in Mongo/Minimongo. On the client, every find does indeed clone the entire object. And an update clones the entire object many, many times. On the server, the entire contents of each client’s minimongo cache is stored in memory. So if you have 20 objects10 mb100 customers then that is already roughly 20gb on the server. You could try fetching the objects via a Meteor Method, but you might be best off serving it as a static asset.


#3

Could be. Apollo sounds like possible solution - both cachable rest API calls and nice reactive interface for Meteor. But I guess that is far away still.

I was also thinking about using CollectionFS to store the files, and then to cache the corresponding url. But looks like CollectionFS is all but dead now. Though as I am already using it in a project, I might try this route anyway. Will see.