Disabling crawlers for staging/testing environments

Quite simple: how do I disabling search engines and other bots crawl a Meteor app which is deployed on, say, staging.myapp.com? The robots.txt in the public dir is untouchable by server code.

The NODE_ENV in this case has appropriate values, such as staging and testing.

1 Like

Are you using something like mupx for deployments? If so you could handle this in advance by wrapping your mupx call in a shell script, that first sets up the appropriate robots.txt file based on environment. For example:

your_app_root/deploy/robots.prod.txt
your_app_root/deploy/robots.staging.txt
your_app_root/deploy/deploy.sh

deploy.sh would look something like:

#!/bin/bash
robots="robots.prod.txt"
if [ "$1" = "staging" ];
then
  robots="robots.staging.txt"
fi
cp $robots ../public/robots.txt
mupx deploy

When deploy.sh is run it would overwrite your apps existing /public/robots.txt with the production one, then continue to deploy via mupx. If called as “deploy.sh staging” it would overwrite with the staging one, and continue to deploy.

We’re sadly not using mup/mupx for deployments – for production we use Ansible scripts, and for staging/testing we use plain shell scripts.

But I reckon we can use the same thinking for those scripts when bundling and deploying. Thanks for the idea!

Way late to the party here, but figured I’d share the solution I just implemented for the common good.

Install this package: https://atmospherejs.com/gadicohen/robots-txt

Then at startup, you can run:

robots.addLine('User-agent: *')
if (isStaging) {
    robots.addLine('Disallow: /') //blocks all URLs
} else {
    robots.addLine('Disallow: /specific/blocked/url/path/*')
}
1 Like