Subscription performance

Hi everyone…

I have a Mongodb collection that has been growing steadily, currently around 20,000 items. In one of the access scenarios, a publication used with a specific subscription search criteria has started to gradually show response time degradation (around 5 seconds now) but only the first time a client subscribes with that criteria (i.e…, reuse is clearly in effect, since subsequent subscriptions from other clients have sub-second response).

The collection has the following relevant characteristics:

  • About 20,000 items
  • Mongo indexed on a boolean field { act:1} - in 99% of the items act is true - and on a date field {requestDate:1}.

The subscription has the following syntax:
return Requests.find({act:true},{fields:{…},limit:lmt, sort:{requestDate:-1}})

where lmt is set to 25, but can increment by 25 in subsequent subscriptions (if subscribing again with lmt set to 50, 75, etc. - used to show more items join a page when requested by the user) response is sub-second.

Mongodb response has no issue (same query from mongo shell is instant). During the subscription time (before data is received on client), node cpu spikes to close to 100%, then settles down to normal state (between 5 and 15%). Memory does not appear to be constrained. Side of each returned item, is less than 100 chars in average

Any ideas of what may be happening?

I think I remember some comment from @arunoda somewhere about chocking cpu due to retrieving mongo data for publishing in some cases, but don’t remember where…

Pretty please, someone? :smiley:

I thought someone else, and with more specific knowledge than I, would reply soon enough that I shouldn’t post my potentially not very useful thoughts. But seems not, so here we go:

Are you using Oplog tailing on that instance or not? (I think when you run an app through the meteor tool it does use oplog.)
I’m asking because I also, a long time ago, had read something about oplog and specific cases where that leads to problems, and I very vaguely remember something about limit (and sort) in there, too.
Oh, actually, now that I say that: Could it just be about the sort modifier?

The behavior that you’re describing shows that node is doing a bunch of work, at least the first time this subscription is requested. So that means it’s processing something CPU-bound, so it is probably doing something like sorting or comparing or in some other way preparing the (large number of) (potential) results to send back to the client.

And so some of the things to do in order to learn about what’s happening here are: oplog on/off, tweaking the selector, tweaking sort & limit.
The selector might be an issue because you’d usually not expect an index to be so imbalanced (99% true), so it’s like fetching all the items anyway. I think this is probably not a problem, but I could imagine it being one.

Does that give you any ideas for what to try?

Also, there’s always the option of running a node profiler. I haven’t had the need to do that so far so I can’t give any hints as to how to get started with that, but it shouldn’t be too hard and it would definitely give you a lot of insight into what is actually happening during that subscription at runtime!

Thanks for the reply, @seeekr . Points you noted have been through my own thought processes (did not mention any suspicions so not to bias anyone). Clarifying you question: Yes, am using Oplog tailing from a 2-node + arbiter Mongodb replica set.

I thought about sort, but can’t see why that could be an issue, since the same db query from Mongo shell resolves it immediately (as it should since the sort key is an index and shows up as being used on a mongo cursor .explain() output.

I also considered dropping the act index, but can’t see how it could have any effect…

Would Meteor server drop the limit option when querying Mongodb and as such cause a large number of items to be retrieved from the db? Maybe, but I can’t think why it should do that. Maybe someone knowing Meteor’s guts in this area could comment?

Puzzling…

Doh! Solved it.

It had nothing to do with that specific publication, but it was caused by a Counts.publish of the total number of items found (using tmeasday:publish-counts pkg), a few statement back. That previous sub was retrieving the _id for all items satisfying the criteria. publish-counts does not scale very well…

I’m attempting to address the situation with a combination of making the publish-counts pub non-reactive when cursor.count() > 1,000 items (not ideal but can does not impact usability, just consistency) and wrapping the Counts.publish call in a Meteor.defer() so that the main pub returns immediately…

If anyone can think of a better approach, I’m all ears.

2 Likes

have a look at this package : https://github.com/nate-strauser/meteor-publish-performant-counts

1 Like