In my current work, my boss got a project about getting data from websites and posting it into a system. Sadly, one of those websites lacks an API, so we needed to figure out a way to get information from there. The short answer is scraping. To make this story short, after evaluating multiple projects, I think the best option is to use CasperJS.

CasperJS is an open-source navigation scripting testing utility written in Javascript for the PhantomJS WebKit headless browser and SlimerJS (Gecko). It eases the process of defining a full navigation scenario and provides useful high-level functions and methods for common tasks such as:

  • defining ordering browsing navigation steps,
  • filling submitting forms,
  • clicking the following links,
  • capturing screenshots of a page (or part of it),
  • testing remote DOM,
  • logging events,
  • downloading resources, including binary ones writing functional test suites, saving results as JUnit XML,
  • scraping Web content.

To install CasperJS, I did the following.

Installation

  • I set up OKay's RPM repository,
  • I installed Phantom 1.9.x by typing yum install phantom19. Note the binary will be at /usr/bin/phanthom19
  • I downloaded the Casper 1.1-beta3 (the latest available at this moment) from the GitHub page, and I unzipped it. CasperJS 1.0.x won't run under Phantom 1.9.x.

We are done! To call Jasper do a command like this:

PHANTOMJS_EXECUTABLE=/usr/bin/phantomjs19 ~/casperjs-1.1-beta3/bin/casperjs

";