Hitting 40% CPU usage on Standard galaxy container [Reward]

My app hits 30-40% CPU usage on a standard galaxy container. This happens on full page refresh and login.

I’ve tried routing through a CDN (Cloudflare), enabling Oplog and indexing to no avail.

I have a collection with around 150 items. Probably doing something stupid, but tearing my hair out here - would anyone be able to look at my code? Happy to pay for your time.

I can take a look. PM me with details.

Are you doing any Interval loops on your server?

I wrote an MMO in Meteor. Server side processing was a beast moving about 1000 records every 3 seconds. You will burn out your Galaxy server quickly.

I moved to Google Cloude Compute with a free $300 dollar credit. Got setup (HARD), and still experienced this problem.

I optimized my server side code.

It’s probably a loop stuck.

Also, get yourself Kadira from NodeChef hooked up for $10 bucks a month and it will probably show your problem in the first 2 minutes of using the product. Cheers,

Hi both,

I’ve setup Kadira/APM and attached a few screenshots:

From what I’ve read, publications should only take approx. 200ms, so I’ll paste below the OrgExpenses and Expenses publications below in case I’m doing anything silly:

Meteor.publish('OrgExpenses', function() {
  var loggedinuser = Meteor.user();
  // If the user is an admin, show them all expenses from their company
  if (loggedinuser.adminofcompanyid > 0) {
    return Expenses.find({
      companyid: loggedinuser.adminofcompanyid
    });
  }
  // If the user isn't an admin, just show them their own expenses
  else {
    console.log('User is not a superadmin so we will only show theirs')
    return Expenses.find({
      owner: loggedinuser._id
    });
  }

});

// Publishes absolutely all expenses for superadmins
Meteor.publish('Expenses', function() {
  var loggedinuser = Meteor.user();
  if (loggedinuser.issuperadmin) {
    return Expenses.find({});
  }

});

Thanks,

Do you have indexes on owner and companyid?

I don’t think it’s an issue with the code you’ve provided.

Based on the subscription RATE something looks a little wonkey there. I have experience in subscription rate error handling from my MMO Video Game I built on Meteor. I pushed her to the limits.

So my subscription rate was insanely high for just a few users.

Turns out my IronRouter at the time was re-subscribing by re-rendering my template all the time due to some bad code. I swapped Iron for FlowRouter, which forced me to completely rebuild my render and subscription interface code.

This helped fix that problem. But it really just means you need to re-examine the code surrounding subscription from the Client side, fetching the Publication.

See what you can dig out. Keep exploring Kadira.

2 Likes

I think SkyRooms is on the right track here. You probably have a similar problem, where you’re subscribe code is unintentionally reactive—you are subscribing (or possibly calling mutating methods) in response to changes in data.

The actual slowness is a little surprising, are your Expenses or Charges documents large? Do they contain image data, like a image of a receipt?

Try adding {fields: {_id: 1}} as the option argument in your find calls in the `publish. This will almost certainly clear up the issue, regardless of whether or not your template code is strange. Then, add in fields one-by-one that you actually need to render. If you need an image field, that’s probably your issue.

2 Likes

Hmm, I don’t think it’s re-subscribing or smashing the database as it only spikes for a brief moment on login.

The Items document can be fairly large, but images are referenced instead of stored.

Here’s some more updates:

  • I added {fields: {_id: 1}} as suggested by doctorpangloss. For some reason my Items code didn’t respect this parameter so I just commented out the publications. With no publications, the CPU load reduced to 0.2 ECU. But on the downside, I had no publications…
  • A consultant told me he thinks it could be my cheap mLab hosting blocking everything up. I moved the DB to super fast production grade Atlas hosting and the app used exactly the same CPU (although to be fair it was blindingly fast).

All this leads me to think…

Is it really that bad my CPU hits 0.5 ECU when one person logs in? A quad core should handle 8 simultaneous logins… and maybe 5x if I’m prepared to let users wait for 5-10 seconds…

Unless you’re running in a scale-out configuration (like using nginx to distribute connections to multiple node processes), the extra cores are almost worthless.