Re: JavaScript and Screenscraping
On Wed, 30 Mar 2011, Michal Kleczek wrote:
Roedy Green wrote:
I am working on a screenscraping project that is turning out to much
more time-consuming that I thought it would be. I am trying to gather
a database of information about all the motherboards sold my major
manufacturers. The idea is to eventually create a comparison shopper
to help you narrow down models that fit your needs.
Oddly motherboard manufacturers don't use a database and generate
their specification pages. These are all hand-compiled with theme and
a dozen variations on every field. This is can handle.
However, Asus decided to obfuscate their web pages with JavaScript.
There are no data on them.
I wondered if there exists a tool that is like browser in that it will
read a page and render the JavaScript, but unlike a browser, it would
not show the information on the screen, just dump the generated HTML
or raw text and accept a script of pages to analyse.
http://htmlunit.sourceforge.net/
Finally, someone else who knows about it!
tom
--
For the first few years I ate lunch with he mathematicians. I soon found
that they were more interested in fun and games than in serious work,
so I shifted to eating with the physics table. There I stayed for a
number of years until the Nobel Prize, promotions, and offers from
other companies, removed most of the interesting people. So I shifted
to the corresponding chemistry table where I had a friend. At first I
asked what were the important problems in chemistry, then what important
problems they were working on, or problems that might lead to important
results. One day I asked, "if what they were working on was not important,
and was not likely to lead to important things, they why were they working
on them?" After that I had to eat with the engineers! -- R. W. Hamming