I found this in my AW stats: Unknown robot (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-) 636,127+83 17.95 GB This is what is consuming maximum bandwidth on my website. 4 times than what google bot is consuming. I don't know what these bots are, or what amount of traffic they are sending, but total hits by bots on my site this month has crossed 1M, which resulted in a "bandwidth exceeded" error message Is there a way to block all these using php preg? But I don't want to block google bot, yahoo bot and bing bot. Can someone please give me the preg_match rule in PHP to detect all bots? I want to use PHP so that I can log their hits to a log file before sending a 403 code Thanks
Try the following in preg_match. $pattern = '/^bot(\s|\_|\+|\:|\,|\.|\;|\/|\\|\-)$/'; G r e e n h o r n j a m i e
I don't know if you've ever seen or used this. There you can assign good bots and block all other bots. It should be placed right at the top of your index page. <?php // --------------------------------------------------------------------------------------------------------------- // Banned IP Addresses and Bots - Redirects banned visitors who make it past the .htaccess and or robots.txt files to an URL. // The $banned_ip_addresses array can contain both full and partial IP addresses, i.e. Full = 123.456.789.101, Partial = 123.456.789. or 123.456. or 123. // Use partial IP addresses to include all IP addresses that begin with a partial IP addresses. The partial IP addresses must end with a period. // The $banned_bots, $banned_unknown_bots, and $good_bots arrays should contain keyword strings found within the User Agent string. // The $banned_unknown_bots array is used to identify unknown robots (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-). // The $good_bots array contains keyword strings used as exemptions when checking for $banned_unknown_bots. If you do not want to utilize the $good_bots array such as // $good_bots = array(), then you must remove the the keywords strings 'bot.','bot/','bot-' from the $banned_unknown_bots array or else the good bots will also be banned. $banned_ip_addresses = array(''); $banned_bots = array('.ru','AhrefsBot','crawl','crawler','DotBot','linkdex','majestic','meanpath','PageAnalyzer','robot','rogerbot','semalt','SeznamBot','spider'); $banned_unknown_bots = array('bot ','bot_','bot+','bot:','bot,','bot;','bot\\','bot.','bot/','bot-'); $good_bots = array('Google','Googlebot','MSN','bing','bingbot','Slurp','Yahoo','DuckDuck'); $banned_redirect_url = 'https://google.com'; // Visitor's IP address and Browser (User Agent) $ip_address = $_SERVER['REMOTE_ADDR']; $browser = $_SERVER['HTTP_USER_AGENT']; // Declared Temporary Variables $ipfound = $piece = $botfound = $gbotfound = $ubotfound = ''; // Checks for Banned IP Addresses and Bots if($banned_redirect_url != ''){ // Checks for Banned IP Address if(!empty($banned_ip_addresses)){ if(in_array($ip_address, $banned_ip_addresses)){$ipfound = 'found';} if($ipfound != 'found'){ $ip_pieces = explode('.', $ip_address); foreach ($ip_pieces as $value){ $piece = $piece.$value.'.'; if(in_array($piece, $banned_ip_addresses)){$ipfound = 'found'; break;} } } if($ipfound == 'found'){header("location: $banned_redirect_url"); exit();} } // Checks for Banned Bots if(!empty($banned_bots)){ foreach ($banned_bots as $bbvalue){ $pos1 = stripos($browser, $bbvalue); if($pos1 !== false){$botfound = 'found'; break;} } if($botfound == 'found'){header("location: $banned_redirect_url"); exit();} } // Checks for Banned Unknown Bots if(!empty($good_bots)){ foreach ($good_bots as $gbvalue){ $pos2 = stripos($browser, $gbvalue); if($pos2 !== false){$gbotfound = 'found'; break;} } } if($gbotfound != 'found'){ if(!empty($banned_unknown_bots)){ foreach ($banned_unknown_bots as $bubvalue){ $pos3 = stripos($browser, $bubvalue); if($pos3 !== false){$ubotfound = 'found'; break;} } if($ubotfound == 'found'){header("location: $banned_redirect_url"); exit();} } } } // --------------------------------------------------------------------------------------------------------------- ?> Code (markup):
That script already exits when the bad bot in question is detected on line #38: if($pos1 !== false){$botfound = 'found'; break;} Code (markup): G r e e n h o r n j a m i e
Hi, Thanks for the scripts. One more question. Which one of these is real bingBot? Last one is google. I already blocked one fake googleBot Is this real one? Or is this one also fake? I had to remove http because forum won't let me post URLs that redirect somewhere... 40.77.167.11 27 Jul 11:59:16m mozilla/5.0 (compatible; bingbot/2.0; bing.com/bingbot.htm) 207.46.13.188 28 Jul 12:08:37:am mozilla/5.0 (compatible; bingbot/2.0; bing.com/bingbot.htm) 66.249.65.126 27 Jul 11:59:20m mozilla/5.0 (compatible; googlebot/2.1; google.com/bot.html)
Hi @JEET, Those bots (bingbot, googlebot) are good. I wouldn't block them. -- G r e e n h o r n j a m i e