Scraping a website - cron job?

linusj · January 7, 2016, 4:42pm

Hey!

Quick question!

There’s a website that gets updated once a month with new information that i find relevant. I figured i wanted to scrape that website periodicly. The idea is for the application to scrape and push a notification each time it finds something relevant to a filter i’ll make. Once a week or once a month. depending on how often i want it to look for changes on the website.

Should I be looking into using cron jobs for this? More specifically: https://atmospherejs.com/percolatestudio/synced-cron

Never scraped a website before so i’m not entirely sure how to approach it.

peter1 · January 7, 2016, 4:48pm

Yes, you can do that with synced-cron, and this is a pretty simple way to schedule a task to run at intervals. Keep in mind that depending on how complex the scrape job is, it may put a bit of a load on your server, so loading up your application server with stuff like this probably isn’t a good idea in the long run.

A better way to do this would be to offload the job to AWS Lambda, an Iron.io worker, or something like that.

linusj · January 7, 2016, 4:58pm

Awesome! Thx. The data pretty much involves a list of items and that’s about it. Not sure if that would take the heavy load of it?

miningsam · January 7, 2016, 5:01pm

http://blog.miguelgrinberg.com/post/easy-web-scraping-with-nodejs