We're sorry… … but your computer or network may be sending automated queries. To protect our users, we can't process your request right now


#1

I’ve deployed my meteor app from meteor galaxy correctly http://perfilesgs.meteorapp.com/. Everything works fine except for a problem. In my app I query a Google URL via routes, for the purpose of scraping and get data.

The problem is that in my app on my localhost (localhost:3000) it works fine, since query done has the response headers from the server.

But once deployed to meteor galaxy, the log records shows that google does not allow automatic queries.

The strange thing is that in my localhost I can make the query perfectly and get the data, but in my web server I get the following message:

We’re sorry … … but your computer or network may be sending automated queries. To protect our users, we can not process your request right now

I would appreciate any advice or help to solve this problem. Thank you very much, Greetings.

Router.route('/scraper/:id_investigador', function(){
  this.response.setHeader( 'Access-Control-Allow-Origin', '*' );
  this.response.setHeader( 'Access-Control-Allow-Methods', 'GET, POST, PUT, DELETE' );
  this.response.setHeader( 'Access-Control-Allow-Headers', 'Content-Type, X-Requested-With, x-request-metadata' );
  this.response.setHeader( "Content-Type",  "text/html; charset=utf-8" );
  this.response.setHeader( 'Access-Control-Allow-Credentials', true );

  var url = 'https://scholar.google.cl/citations?user=' + id_investigador

  request(url, async function(error, response, html){

      var $ = cheerio.load(html);

.
.
.

  }
}, {where : "server"});

#2
  1. You could try setting the user-agent header to a real browser. (You can also use HTTP.get method)
  2. Try the same request using curl and/or wget (from the server) - also try different user agents.
  3. Try the request from phantomjs.
  4. Try a different webhost (some are very cheap and charge by the hour so a test might literally cost 1 penny, see digital ocean, vultr, etc. Also google cloud has a free tier but takes a few steps to set up ssh access. Oh I almost forgot they have an in-browser terminal access to a free server you could test from.)
  5. Some sites, like amazon are pretty militant about blocking scrapers and you might not be able to do anything.

Some IP ranges get blacklisted for doing too much scraping. Meteor galaxy IP range may be in google’s scraper blacklist (just a hypothesis).

PS Google scholar may have an API that allows you to do the same query.


#3

See: https://academia.stackexchange.com/questions/34970/how-to-get-permission-from-google-to-use-google-scholar-data-if-needed