Disabling crawlers for staging/testing environments

johanbrook · January 7, 2016, 10:43am

Quite simple: how do I disabling search engines and other bots crawl a Meteor app which is deployed on, say, staging.myapp.com? The robots.txt in the public dir is untouchable by server code.

The NODE_ENV in this case has appropriate values, such as staging and testing.

hwillson · January 7, 2016, 11:54am

Are you using something like mupx for deployments? If so you could handle this in advance by wrapping your mupx call in a shell script, that first sets up the appropriate robots.txt file based on environment. For example:

your_app_root/deploy/robots.prod.txt
your_app_root/deploy/robots.staging.txt
your_app_root/deploy/deploy.sh

deploy.sh would look something like:

#!/bin/bash
robots="robots.prod.txt"
if [ "$1" = "staging" ];
then
  robots="robots.staging.txt"
fi
cp $robots ../public/robots.txt
mupx deploy

When deploy.sh is run it would overwrite your apps existing /public/robots.txt with the production one, then continue to deploy via mupx. If called as “deploy.sh staging” it would overwrite with the staging one, and continue to deploy.

johanbrook · January 7, 2016, 12:03pm

We’re sadly not using mup/mupx for deployments – for production we use Ansible scripts, and for staging/testing we use plain shell scripts.

But I reckon we can use the same thinking for those scripts when bundling and deploying. Thanks for the idea!

jasongrishkoff · August 18, 2019, 7:45am

Way late to the party here, but figured I’d share the solution I just implemented for the common good.

Install this package: https://atmospherejs.com/gadicohen/robots-txt

Then at startup, you can run:

robots.addLine('User-agent: *')
if (isStaging) {
    robots.addLine('Disallow: /') //blocks all URLs
} else {
    robots.addLine('Disallow: /specific/blocked/url/path/*')
}