Re: Automating Searches
Daniel Pitts wrote:
Although, I do wonder why you would want to store search results in a
database.
Its not that hard to make a data scrapper, and just use the website
directly. But Google DOES give you an API to do it more easily.
Yeah, but using that API (at least very much) is expensive. Scraping the
results after submitting a normal query URL and a) not diving too deeply
into the results or b) doing new queries too often you can probably fly
under the radar and unless you're coming from a datacenter somewhere
they won't know you from Adam doing manual searches in Firefox.
To top it off, Java makes transparently caching pages (and with 1.6
implementing cookies) easier too. Add in a deliberate request of the
front page before doing the search query, some random delays, and a
spoofed user-agent, and I'm guessing the only way Google could figure
out you weren't just a surfer using Mozilla 4.0 (compatible; MSIE 4.0)
would be by using a tool like EtherSniffer to analyze your incoming
requests and discovering that Java sends the HTTP headers in an
idiosyncratic sequence. And they won't do that unless your IP generates
an eyebrow-raising amount of traffic.
And for Google that "eyebrow-raising" threshold is set very high indeed;
"normal" traffic for Google is millions of searches per day and there
are frequently dozens per day from each of many individual IP addresses
as well as untold numbers of one-offs and the like.
And, of course, as long as you don't generate more traffic faster than
you could by typing in all those queries manually, I don't see any moral
qualms with this. At worst it's equivalent to adblocking the sponsored
links on the results page with a commonly-available Firefox extension.
All you've done is automate some tedium at your end without having any
discernible effect at theirs versus not automating the tedium. So unless
you do believe in victimless crimes or don't believe in the identity of
indiscernibles ... :)