Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

In my current work, my boss got a project about getting data from websites and posting into a system. Sadly, one of those websites lacks an API, so we needed to figure out a way to get information from there. The short answer is scraping. To make this story short, after evaluating multiple projects, I think the best option is to use CasperJS.

CasperJS is an open source navigation scripting testing utility written in Javascript for the PhantomJS WebKit headless browser and SlimerJS (Gecko). It eases the process of defining a full navigation scenario and provides useful high-level functions, methods syntactic sugar for doing common tasks such as:

  • defining ordering browsing navigation steps
  • filling submitting forms
  • clicking following links
  • capturing screenshots of a page (or part of it)
  • testing remote DOM
  • logging events
  • downloading resources, including binary ones writing functional test suites, saving results as JUnit XML
  • scraping Web contents

To install CasperJS, I did the following.

Installation

  • I setup OKay's RPM repository,
  • I installed Phantom 1.9.x by typing yum install phantom19. Note the binary will be at /usr/bin/phanthom19
  • I downloaded the Casper 1.1-beta3 (was the latest available at this moment) from the GITHUB page, and I unzipped it. CasperJS 1.0.x won't run under Phantom 1.9.x.

We are done! To call the Jasper do a command like this:

PHANTOMJS_EXECUTABLE=/usr/bin/phantomjs19 ~/casperjs-1.1-beta3/bin/casperjs

blog comments powered by Disqus

About

Read about IT, Migration, Business, Money, Marketing and other subjects.

Some subjects: FusionPBX, FreeSWITCH, Linux, Security, Canada, Cryptocurrency, Trading.