Credit Cards - Buy Shares - Loans - MPAA - Credit Cards

PDA

View Full Version : Limit bot access frequency


softarea51
Nov 1st 2007, 5:04 pm
Hello,

Today I noticed my site was almost down because 7 bots (including yahoo and google) were crawling my site. How can I instruct them to make few requests once, to reduce the frequency or to make a pause between 2 requests?

ajsa52
Nov 1st 2007, 5:21 pm
You need a file called robots.txt on your root directory, and use the Crawl-Delay directive.
Basically it allows you to specify an amount of time (in seconds) that Bots should wait before retrieving another page from that host.
NOTE: Yahoo bot usually crawls larger sites from several IPs simultaneously.

Example:


User-agent: *
Disallow:
Crawl-Delay: 10

User-agent: ia_archiver
Disallow: /

User-agent: Ask Jeeves
Crawl-Delay: 120

User-agent: Teoma
Disallow: /html/
Crawl-Delay: 120

Monty
Nov 1st 2007, 5:27 pm
Crawl-delay is fine for Yahoo but it's ignored by Googlebot.

For Google you can choose the exploration speed from the Google Tools for Webmasters panel, it may help.

softarea51
Nov 1st 2007, 5:37 pm
thank you all.

softarea51
Nov 5th 2007, 7:24 am
Is there a tool to track bots on my site? Which one you recommend?
I need to find the bad ones, who open too many requests once and deny their IP addresses.

ajsa52
Nov 5th 2007, 7:52 am
I'm denying access to the following user-agents, because usually are used for people to steal content sites:
"Wget"
"HTTrack"
"WebCopier"
"WebSauger"
"WebReaper"
"WebStripper"
"Web Downloader"
"libwww-perl"
"Python-urllib"

softarea51
Nov 5th 2007, 12:53 pm
How do you block an user-agent?

ajsa52
Nov 5th 2007, 1:05 pm
You need to add on your .htaccess file
Example, denying a few user agents and an IP range:


SetEnvIfNoCase User-Agent "WebCopier" dontlike
SetEnvIfNoCase User-Agent "WebSauger" dontlike
SetEnvIfNoCase User-Agent "WebReaper" dontlike

# RufusBot Address: 64.124.122.224 - 64.124.122.255
SetEnvIf Remote_Addr "^64\.124\.122\.2(2[4-9]|[3-5][0-9])" dontlike

Options -Indexes -Includes
Order allow,deny
Allow from all
Deny from env=dontlike