In my current work, my boss got a project about getting data from websites and posting into a system. Sadly, one of those websites lacks an API, so we needed to figure out a way to get information from there. The short answer is scraping. To make this story short, after evaluating multiple projects, I think the best option is to use CasperJS.
- defining ordering browsing navigation steps
- filling submitting forms
- clicking following links
- capturing screenshots of a page (or part of it)
- testing remote DOM
- logging events
- downloading resources, including binary ones writing functional test suites, saving results as JUnit XML
- scraping Web contents
To install CasperJS, I did the following.
- I setup OKay's RPM repository,
- I installed Phantom 1.9.x by typing yum install phantom19. Note the binary will be at /usr/bin/phanthom19
- I downloaded the Casper 1.1-beta3 (was the latest available at this moment) from the GITHUB page, and I unzipped it. CasperJS 1.0.x won't run under Phantom 1.9.x.
We are done! To call the Jasper do a command like this:
PHANTOMJS_EXECUTABLE=/usr/bin/phantomjs19 ~/casperjs-1.1-beta3/bin/casperjsblog comments powered by Disqus