Subscrbe is too slow


#1

development projects with Meteor, a feature point is this, I have a table a, put a list of a_items, the number is about 50000。

When I use another function, I need to load all the a_items, and user can choose a a_item, and other operations.
I had writing the subscribe in waitOn of router controller.

But every time subscribe takes a loading for a few seconds before entering the page, if I don’t write in waitOn, it’s a quick entry to the page, but loads a_items Is incomplete

Is there any good solution to it?


#2

If you need all of them then it’s going to be slow. 50,000 documents is nothing to sneeze at and you are going to be pushing your browsers limits as well. If you have to publish all of the documents then there is no real good solution to making it faster. It takes time for those documents to go from the server to the client. I have never seen a use case for publishing all the items in a collection to the client so I would be curious to hear why you are doing that.


#3

I agree with @khamoud . What (sane) end-user is going to make a choice from a list of 50000 items?


#4

Also agree with @khamoud - You never actually want to load in 50k documents in one subscription. I highly suggest you paginate.

I doubt you are actually showing all 50k options in one actual page - the user probably needs to scroll or activate some elements on the page in order to view more I assume. You can resubscribe and get more of your documents using the same subscription if you paginate properly and kick off a resubscribe during the proper events.


#5

If, for some reason, you do really need to show the 50 000 rows of data, you can try and use Clusterize.js to drastically improve the performance but even that’s not gonna stop the browser from being slow to load up 50 000 rows of data.


#6

Let’s ignore the fact that 50000 rows of data are too many for a moment.

What do you mean when you say that “your list is incomplete when you remove the waitOn handler”? If it works the way it should the list should be populated the at the same time the subscription is loading.

If what you mean is that “the list is incomplete at start but keeps being populated as time passes” maybe you could ‘trick’ the user by showing what is visible but not allowing interaction until everything is done.

If what you mean is that the list is incomplete and stays that way, then it’s something we should look deeper into


#7

Have you tried the Tabular Table package? This allows you to search, filter, etc, across millions of rows, while only showing 50 or 100 at a time (all the user can really see at once anyhow). You can also use an endless scroll plugin if that’s needed. I’m using in production now and it’s really nice!


#8

Yes, I had tried Tabular Table, and is very slow.


#9

to be fair, pagination is quite difficult. I have not seen a package do this properly yet.

Using skip and limit with 10^4 documents is not a good idea, see post by @slava mentioning the problems with skip.


#10

tldr: What is the advantage of publish/subscribe over an API (for collections with x*10k rows)?

Please correct me if I missunderstood the publish/subscribe-model: I read in the docs (I’ll search later where that was) that you can subscribe to any collection without caring about limits etc. Meteor will load only the data asked for in a query.

Actually I experience the same issue as the OP: Having a big collection and very slow subscribtions (about 10s). The same queries in the mongo shell together are <10ms.

So… I need to make limits and sorts in publishs? I tried and it is much faster… But then what is the advantage of publish/subscribe over a classical API?

A classical API works something like this: “Get me the last 10 posts, including corresponding comments, users and likes” and you get some JSON/XML/whatever, everything included in the right place. It’s easy to implement; the heavy logic (query comments, put them to correct posts etc.) is on the server side.
Meteor is about bringing that logic to the client (as far as I understood). But starting to implement limits and sorts on server side with relational data is kind of the same as an API. I mean how shall I limit users (which wrote posts and comments) to the comments (of the last x posts) and last x posts without querying them before. The very same logic I’ll implement in the view in Meteor.
So for the same query as above (“get the last 10 posts”), I need to write publishs, subscribtions and add the logic twice (on server and client) which is very unattractive (what if I change something?).

I hope my question is understandable.

What am I doing wrong? :slight_smile:


#11

I was flirting with a possible solution to this a few months ago…

This would make it so you only have to write the query logic on the server, vastly simplifying your client side:

In summary:

In your publication (on server) you would use the added/changed/removed API to tack on the subscription id to each document before sending it down to the client via something like {“SuBScRiPtIoNiD”: true}.

On the client, you would merely have to query MyCollection.find({‘SuBScRiPtIoNiD’:true}). You could easily make all of your subscriptions template-level, and keep track of each active subscription in the template state, via a template reactive instance variable, for example.

Hope this helps you a little bit at least.


#12

@delfa, you always need to query data, send it to the client, perform some kind of aggregation (on client and/or server), and process/display it. Meteor’s API (the subscribe mechanism) doesn’t change that. Its sole (but major) advantage is that it smoothly handles reactivity.


#13

Most of the time it is not subscription which is slow, but DOM manipulation etc as each 1 document is arriving.
Do you show some loader while subscription is not ready instead of results? Cause that could usually improve loading time a lot from user point of view.


#14

(ad)shock Yes, I show a loader and the client thinks it’s ok.

@Steve Yes. But it’s more work when you have to do it twice. And ok, I didn’t get that it was more about reactivity (but used that feature).

@streemo The performance of finds() are fine (even while observing changes), it’s the subscribtions which are pretty slow. I’m going to try that trick and give feedback.

I didn’t mention that there are different views, like: “show last x posts” but also “show last x activity of user y” (activity = posts, comments, likes, but posts without corresponding comments etc). So you make at least one publish per view and collection - gets messy very fast. Or at last one publish per view, like with an api.
After having developed an app with meteor, this is a major drawback on two sides: Initial loading time and query-logic-mess.


#15

https://news.ycombinator.com/newest?next=10173479&n=61

What if this pagination at HackerNews was simple? Next to track next index of that submission post and return n. For realtime observe new data could be track on the newest post by uuid.


#16

It works well for id’s with autoincrement, but not with Meteor default ones. Because it is not so clear how to find next n records after the one with given _id. If anyone know some working and fast solution it would be nice to see


#17

@streemo The hack is not bad: Resolves the long subscribtion time and puts logic into one place. Thanks!


#18

Yep.

But I am not very knowledgeable on the internal mechanics of server=side subscription data caching. For example, would tacking on a unique (random) subscriptionId to documents before sending them to the client prevent the server from reusing the cached data between connections? Does it even reuse cached data, or does it dupe all of it? It might be that this “hack” in my earlier post ruins some of the pros of Meteor’s caching on the server - but like I said am not 100% sure.

When an app gets bigger, this “hack” looks more and more enticing :slight_smile: It would make clients super thin.

I need to do more research into this. Meteor caches the data for each subscription for each client. You can access this data via Meteor.server.sessions[connectionId]._namedSubs._documents. You’ll notice it contains data that you appended in added/changed/removed.

I don’t know the internals of V8 well enough to know how the server manages the memory between connections. It might be that appending an extra unique key-value pair to the sent document (like {thisParticularSubId: true}) is trivial because it duplicates the data anyways.

IIRC, @arunoda has written some articles about this. I’ll have to go back and look at those into more detail at meteorhacks.com


This is really the key. Once you can find some metric on which to rank posts, you can take skip outside of the equation by using $gt, $lt, etc.

But because all you need is an orderable field, something like created_at would work. But see, this limits to how one can order things. In the hacker news case, they order by some sort of wilson score, or some simpler, cruder method. So, if you cache scoring data as opposed to on-the-fly computation, then you can use this ordered field to do the same thing.


#20

Absolutely! Was thinking the same.

Interesting question. In my example I appended all comments, likes and correspending users to my “post”-row, so the sent row was much bigger. But then I didn’t subscribe to others.
Looking at the measure of received data in the dev console in chrome, I didn’t discover a noticeable increase. The biggest data seem to come from the libraries (2/3) and images (nearly all the rest).


#22

So, I’ve been experiencing this a bit as well. I have a collection, of which I create a subscription, and yep, I would like all documents to be available, as these are addresses, and as the user types (after 3 characters) I try to fill a datalist using a query from the subscription to help them find the right address and select it more quickly.

Just loading my page, which only shows the form the datalist is on as a modal, takes almost 20 seconds in development environement. The current size of the address collection is 15000 documents.

I can’t really pre-filter the subscription, because I never know what address the user may need at any given time.

How can I speed this up and still make it useful. Once loaded, the form works, the datalist works pretty quickly (unless I start back-spacing, but I think that’s still on me to limit the number of rows allowed in the data-list. Anyway, I’d love ideas on how to improve performance, as I’m not using nearly all of the actual addresses a single site might have. The one I’m using has 129,000 address entries in their SQL database, and I imported a small portion for testing when I ran across the sudden long load times.