Performance problems on Galaxy-hosted app in Method that returns a lot of data

In my app I have a method for allowing the admin to download a CSV of the users assigned to him. It consists of two database queries, three lodash loops to to process it, and an unparsing of the object to CSV via baby parse. It then returns that string from the method where it is downloaded by a client.

On my local machine with a small test database this works without issue. On the production site this operation is querying about 12 MB of data, about 14,000 records, but it takes about 4 minutes for the Method to run, with the CPU at 100% and causing other connected users to wait for the app to function. It seems to have gotten this bad particularly when we moved from a mupx deployement on Azure to Galaxy (yes, even after migrating our database). Kadira logs are inconsistent with actual experience of the overall time it takes the method to respond, but it does claim that it takes about 7 seconds for one query and 3 seconds for the other.

Does anyone have an idea what could be causing these performance issues? The method is below. Thanks for the help!

const getCustomerCsv = new ValidatedMethod({
  name: 'getCustomerCsv',
  validate: null,
  run() {
    this.unblock();
    if (Meteor.isServer) {
      const query = { ... };
      const rawData = Meteor.users.find(query,
        { fields: { emails: 1, profile: 1 } }
      ).fetch();
      const childIds = _(rawData).map((obj) => obj.profile.children)
        .flatten()
        .uniq()
        .value();
      const children = Children.find({ _id: { $in: childIds } },
        { fields: { name: 1, dob: 1 } }).fetch();
      const maxChildren = (_.maxBy(rawData, (user) => user.profile.children.length))
        .profile.children.length;
      const data = _(rawData).map((user, rawI) => {
        const ret = {
          // build CSV object strcture ...
        };
        // Assign children to data structure
        const len = user.profile.children.length;
        for (let i = 0; i < len; i++) {
          const childId = user.profile.children[i];
          const child = _.find(children, ['_id', childId]);
          ret[`child${i}`] = `${child.name.first} ${child.name.last}`;
          ret[`dob${i}`] = moment(child.dob).format('YYYY-MM-DD');
        }
        // create keys for remaining children structure so it will unparse to csv correctly
        for (let i = len; i < maxChildren; i++) {
          ret[`child${i}`] = '';
          ret[`dob${i}`] = '';
        }
        // delete raw data to free room in memory. All copy operations are primitive types
        delete rawData[rawI];
        return ret;
      }).value();
      return Baby.unparse(data);
    }
    return false;
  },
});

If it was me, I’d locally benchmark every single step in the method while pointing my dev machine to the production db and hopefully learn more about it already. Its hard to tell from just looking at it…

1 Like

I was working on the exact same problem this week, and finally feel that I sort of resolved it, although my CPU still climbs to 100% for the 20 s that the query runs.

At first I had problems that the memory overloads and then restarts the server. At least that is resolved with the following code:

import csv from 'fast-csv';
import fs from 'fs';
import moment from 'moment';
import Fiber from 'fibers';
import {WebApp} from 'meteor/webapp';

export const largeAgDataToFile = (userId,queryObj) => {
    let skip = 0;
    let limit = 8000;
    let agData = [];
    let key = Random.hexString(16);
    let fullPath = `.user-downloads/${userId}/${key}/${queryObj.title}.csv`;
    let foldernk = `.user-downloads/${userId}`;
    let folder = `.user-downloads/${userId}/${key}`;
    let csvStream = csv.createWriteStream({headers: true}).transform(function (row) {
        for (let k in row) {
            if (row[k] instanceof Date) {
                row[k] = moment(row[k]).format('YYYY-MM-DD HH:mm:ss');
            }
        }
        return row;
    });
    if (!fs.existsSync('.user-downloads')) {
        fs.mkdirSync('.user-downloads');
    }
    if (!fs.existsSync(foldernk)) {
        fs.mkdirSync(foldernk);
    }
    if (!fs.existsSync(folder)) {
        fs.mkdirSync(folder);
    }
    let writableStream = fs.createWriteStream(fullPath);
    csvStream.pipe(writableStream);
    writableStream.on('finish', ()=>{
        Fiber(()=>{
            setTimeout(()=>{
                if (fs.existsSync(fullPath)) {
                    fs.unlink(fullPath);
                    fs.rmdir(folder);
                    fs.rmdir(foldernk);
                }
            },10*60*1000);
        }).run();
    });
    do {
        let pipelineLim = queryObj.pipeline.concat();
        pipelineLim.push({ '$limit': skip + limit });
        pipelineLim.push({ '$skip': skip });
        agData = Collections[queryObj.collectionname].aggregate(pipelineLim, {allowDiskUse: true });
        skip = skip + limit;
        agData.forEach(obj=>{
            csvStream.write(obj);
        });
    } while (agData.length !== 0);
    csvStream.end();
    if (agData) {
        agData = null;
    }
    return `user-downloads/${userId}/${key}/${queryObj.title}.csv`;

};

WebApp.connectHandlers.use(function (req, res, next) {
    var re = /^\/user-downloads\/(.*)$/.exec(req.url);
    if (re !== null) {   // Only handle URLs that start with /uploads_url_prefix/*
        let filePath = '.user-downloads/' + re[1];
        let filename = req.url.match(/\w+.csv/gi);
        if (fs.existsSync(filePath)) {
            let data = fs.readFileSync(filePath);
            res.writeHead(200, {
                'Content-Type': 'text/csv',
                'Content-Disposition': `attachment;filename=${filename}`
            });
            res.write(data);
            res.end();
            if (data) {
                data = null;
            }
            Fiber(()=>{
                setTimeout(()=>{
                    if (fs.existsSync(filePath)) {
                        fs.unlink(filePath);
                    }
                },60*1000);
            }).run();
        } else {
            res.writeHead(200);
            res.end();
            throw (new Meteor.Error(411, `Download Error: datafile not available - ${filePath} - ${req.url}`));
        }
    } else {  
        next();
    }
});

I call largeAgDataToFile from a server method, and return the file url. Which I then trigger a POST to the url as the method return to client.

I run my db on Atlas and a small Galaxy container. I the site should serve the data on a regular intervals a would break out this function onto a separate server with some sort of cron schedule. But for me, this works as it will be used about once a week so users can deal with the 20s slow server window…

Hope this helps. I have tested this one up to about 30 Mb of csv data (file build less than a minute). Which can be a lot of records! Note that I aggregate through the collection, which gives you some extra abilities but the last state in the aggregation must be a sort… I think it can be further optimized. Let me know what you think.

I discuss in another post the concept of rate limiting.

You cant just export the whole database, you’ll lock her up.

You’re running on Javascript, which means you can do setTimeout()'s to reduce your cpu consumption.

  • Pull back 100 rows
  • Wait 1 second with SetTimeout
  • Pull back 100-200 rows
  • SetTimeout

Rinse and repeat.

Galaxy gives you 500 mb of temporary storage to use for this kind of thing. Any bigger and you’ll need S3. So write to a temp file, and when complete, email that sucker out.

As we discussed on that topic. I think this working as you describe - see my pervious comment. So I guess wrapping the aggregation in a setTimeout would result in capping the cpu consumption?

Thanks all for the input. Indeed, it does seem like we’re seeing the same kind of issue mrmsupport. I had my 512 mb containers crashing so have them temporarily bumped up to 1 gb. An many thanks for your solution as a go-by. I will need to avoid even 20 seconds lockup though so will have to work out some kind of rate limiting in addition as SkyRooms suggests. My initial thought is that it may be work to just do a simple cursor.forEach on the .find cursor - and that should give other processes a chance to get in the queue. Any thoughts? Maybe not much help for aggregation.

I’ll continue down the path and report back.

We get the same sort of problems for my day job, running PHP and MySQL.

Some one runs a report, and all the users crash out. The system we use has a massive database, hundreds of millions of rows.

I’ve had to manually employ this kind of process to slow things down. Some reports take 2 hours to generate, but eventually land in your inbox.

Lots of lead generation services also do this.

Heck, even Google Analytics uses ‘sample sizes’ to de-stress the servers.

Just wanted to follow up with my solution. I did implement a rate-limiting concept with the following code:

function step(data, params) {
  const d = params.coll.find(params.query, params.options).fetch();
  Array.prototype.push.apply(data, d);
  incDocCount(params.userId, d.length);
  if (d == null || d.length !== params.recordStep) return true;
  params.options.skip += params.recordStep;
  return false;
}

function rateQuery(data, params, callback) {
  const isDone = step(data, params);
  if (isDone) {
    callback(null, data);
    return;
  }
  // ensure the subscription hasn't been stopped before getting next batch
  if (csvPublish.hasOwnProperty(params.userId)) {
    Meteor.setTimeout(rateQuery.bind(null, data, params, callback), params.interval);
    return;
  }
  callback(new Meteor.Error('Operation cancelled'));
}


/**
 * Make a function to perform a rate limited query according to the passed parameters.
 * wraps the function in Meteor.wrapAsync to provide fiber-safe blocking
 * @param {String} userId logged in user's id
 * @param {Object} coll Meteor collection
 * @param {Object} query query object to use with collection.find
 * @param {Object} options options object to use with collection.find
 * @param {Number} recordStep number of records to get each interval
 * @param {Number} interval wait time between intervals, in ms
 * @returns {Function} async query function
 */
function rateQueryFactory(userId, coll, query, options = {}, recordStep, interval) {
  const data = [];
  options.limit = recordStep;
  options.skip = 0;
  const params = {
    options,
    recordStep,
    query,
    coll,
    userId,
    interval,
  };
  return Meteor.wrapAsync(rateQuery.bind(null, data, params));
}

I generate this rateQuery function for each large query in the routine. My approach for delivering the data was a little different also. At first I wanted to just return the csv from the method at the end, but of course ran into a timeout. So instead I use a subscription to deliver the data to a client minimongo instance. It’s delivered directly to the subscriber without creating the collection on the backend. Then the routine that calls the rateQuery functions and builds the csv are called from a Meteor.defer() in the method before it returns empty. The subscription is:

const csvPublish = {};
if (Meteor.isServer) {
  Meteor.publish('customerCSVListener', function customerCSVListener() {
    this.added(csvCollectionName, 'status', { messages: [] });
    this.ready();
    csvPublish[this.userId] = {
      status: [],
      documents: 0,
      added: this.added.bind(this),
      changed: this.changed.bind(this),
    };
    this.onStop(() => {
      delete csvPublish[this.userId];
    });
  });
}

the csvPublish object serves as the interface between the two, contained in the closure of the same file, keyed by the userId. A little bit hacky but it does work. You’ll note I also deliver status messages via the same subscription so the client can see progress. The routine still puts a serious load on the CPU and drives up the memory, I’m working on tuning for performance now.

On the client I just setup the minimongo instance with a: const csvTransport = new Mongo.Collection(csvCollectionName);
Then can do normal reactive queries in the client: csvTransport.find(csvDocId)

Let me know anyone if you have thoughts about how to do this better.

move to performance category?

1 Like