I can't figure out what to make of this stack trace

lassombra · July 31, 2015, 9:58pm

I’ve been having a lot of random slow downs in one part of my site and I am not really sure what the hold up is. Kadira stack traces show all of the time is tied up in “Observe changes” on the collection. The stack trace is at Kadira trace. I don’t really understand what could be taking so long in the observe changes process here. Any insight greatly appreciated.

ffxsam · July 31, 2015, 10:23pm

We’d need to see some code to begin to understand what the issue is.

arunoda · August 1, 2015, 12:13am

I think this is due to you are having a lot of observers. There is a fix for this on the upcoming Meteor 1.2. Try to use the devel branch of Meteor if possible.

lassombra · August 1, 2015, 2:08am

The reason I didn’t provide any code is that there is literally no code of mine executing in the portion that takes FOREVER (10 seconds in some cases). This very long request is problematic for my users. It really is only meteor code that is taking forever and specifically this observe on what should be a reusable selector (though it’s a 3 field query so that might be a hiccup).

Kadira shows me that I have low observer reuse in this area, but I think I’ve optimized that as far as I can as I literally have narrowed the query down to the minimum pieces to get the right publication in this case. It is trying to publish at times over 600 records which I’m wondering if that is the hiccup? All of those records are relevant and need to be published but I would think should be pretty quick still as a direct query to mongo takes around 150ms.

lassombra · August 1, 2015, 2:10am

I’ll look into that. Could you explain to me what is different between 1.2 and 1.1 with regards to observers? Or at least point me to relevant updates/tickets/blog posts? I’d really like to look into this as this is running in production right now and only has this issue occasionally. This subscription averages 150 - 250 ms so these 10+ second requests are certainly extreme outliers.

arunoda · August 1, 2015, 2:21am

If that’s are some outlier, that’s okay. This happens when you’ve more obesrvers (specially when there is a method has a DB write affecting this observer)

See this PR: https://github.com/meteor/meteor/pull/4694

lassombra · August 2, 2015, 5:53am

That’s interesting. We’re running in a clustered environment using a microservice construct where a different application entirely is populating the database, and or user facing application is reading it and making it available to users. That one collection actually holds all of our product information complete with filter selections, and updating one product actually updates the entire group. That would suggest that we might have writes blocking (for various definitions of the word) our reads.

lassombra · August 5, 2015, 4:08pm

After digging through that PR and reading some of the code I see where my bottleneck could be. So I added some logging around a couple of places that I thought might be messing with this and it turns out that in one spot I was writing some 5-600 records in between the start of that publication and the actual return. That’s a lot of records no matter how you dice it. So knowing now my problem I set about trying to optimize away those 500 writes and now have sub-second response times nearly across the board which I like.

I’m hopeful that the PR will help mitigate the cost of large database operations in the future, but for now I was able to be smarter about my database usage to improve performance so thank you.