Just when I was planning to start off a rant about searchengines and bots, an article (item?) from one of the Gillmor brothers caught my eye. He noticed the extraordinary number of directories that have been excluded for searchbots: most of the directories have something to do with Iraq.
“Perhaps the White House doesn’t want to make it easy for people to compare its older statements about Iraq with current realities — though that doesn’t explain why the pages are searchable on the White House site itself. Maybe, then, the White House wants to know who’s looking for these things (e.g. by tracking IP addresses of people who query the government site).”
Naturally, his comments aren’t really appreciated by all of the readers. Should technologists stay far away from making these remarks, I wonder?
um. I block spiders and crawlers including ia_archiver from my site.
I guess I’m just oppressing the people too.
What’s funny is that robots.txt isn’t exactly reliable as many bots and crawlers ignore it. Most blocking is done via .htaccess.
What’s funny is that robots.txt isn’t exactly reliable as many bots and crawlers ignore it. Most blocking is done via .htaccess.
I’m aware of crawlers that ignore that file (I can tell from the Apache logs) but currently I can’t be bothered by doing it the .htaccess way [mental note to Alfons]. Maybe I’m old fashioned or stupid <g>
I can’t be bothered by doing it the .htaccess way [mental note to Alfons].
You can do this yourself…