1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

php preg rule to detect bots?

Discussion in 'PHP' started by JEET, Jul 27, 2017.

  1. #1
    I found this in my AW stats:

    Unknown robot (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-)
    636,127+83 17.95 GB

    This is what is consuming maximum bandwidth on my website. 4 times than what google bot is consuming.

    I don't know what these bots are, or what amount of traffic they are sending, but total hits by bots on my site this month has crossed 1M, which resulted in a "bandwidth exceeded" error message

    Is there a way to block all these using php preg?

    But I don't want to block google bot, yahoo bot and bing bot.

    Can someone please give me the preg_match rule in PHP to detect all bots?
    I want to use PHP so that I can log their hits to a log file before sending a 403 code
    SEMrush
    Thanks
     
    JEET, Jul 27, 2017 IP
    SEMrush
  2. j a m i e

    j a m i e Greenhorn

    Messages:
    24
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    18
    #2
    Try the following in preg_match.

    $pattern = '/^bot(\s|\_|\+|\:|\,|\.|\;|\/|\\|\-)$/';

    G r e e n h o r n
    j a m i e
     
    j a m i e, Jul 27, 2017 IP
    JEET likes this.
  3. qwikad.com

    qwikad.com Illustrious Member Affiliate Manager

    Messages:
    6,456
    Likes Received:
    1,393
    Best Answers:
    24
    Trophy Points:
    400
    #3
    I don't know if you've ever seen or used this. There you can assign good bots and block all other bots. It should be placed right at the top of your index page.

    
    <?php
    // ---------------------------------------------------------------------------------------------------------------
    // Banned IP Addresses and Bots - Redirects banned visitors who make it past the .htaccess and or robots.txt files to an URL.
    // The $banned_ip_addresses array can contain both full and partial IP addresses, i.e. Full = 123.456.789.101, Partial = 123.456.789. or 123.456. or 123.
    // Use partial IP addresses to include all IP addresses that begin with a partial IP addresses. The partial IP addresses must end with a period.
    // The $banned_bots, $banned_unknown_bots, and $good_bots arrays should contain keyword strings found within the User Agent string.
    // The $banned_unknown_bots array is used to identify unknown robots (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-).
    // The $good_bots array contains keyword strings used as exemptions when checking for $banned_unknown_bots. If you do not want to utilize the $good_bots array such as
    // $good_bots = array(), then you must remove the the keywords strings 'bot.','bot/','bot-' from the $banned_unknown_bots array or else the good bots will also be banned.
       $banned_ip_addresses = array('');
       $banned_bots = array('.ru','AhrefsBot','crawl','crawler','DotBot','linkdex','majestic','meanpath','PageAnalyzer','robot','rogerbot','semalt','SeznamBot','spider');
       $banned_unknown_bots = array('bot ','bot_','bot+','bot:','bot,','bot;','bot\\','bot.','bot/','bot-');
       $good_bots = array('Google','Googlebot','MSN','bing','bingbot','Slurp','Yahoo','DuckDuck');
       $banned_redirect_url = 'https://google.com';
    // Visitor's IP address and Browser (User Agent)
       $ip_address = $_SERVER['REMOTE_ADDR'];
       $browser = $_SERVER['HTTP_USER_AGENT'];
    // Declared Temporary Variables
       $ipfound = $piece = $botfound = $gbotfound = $ubotfound = '';
    // Checks for Banned IP Addresses and Bots
       if($banned_redirect_url != ''){
         // Checks for Banned IP Address
            if(!empty($banned_ip_addresses)){
              if(in_array($ip_address, $banned_ip_addresses)){$ipfound = 'found';}
              if($ipfound != 'found'){
                $ip_pieces = explode('.', $ip_address);
                foreach ($ip_pieces as $value){
                  $piece = $piece.$value.'.';
                  if(in_array($piece, $banned_ip_addresses)){$ipfound = 'found'; break;}
                }
              }
              if($ipfound == 'found'){header("location: $banned_redirect_url"); exit();}
            }
         // Checks for Banned Bots
            if(!empty($banned_bots)){
              foreach ($banned_bots as $bbvalue){
                $pos1 = stripos($browser, $bbvalue);
                if($pos1 !== false){$botfound = 'found'; break;}
              }
              if($botfound == 'found'){header("location: $banned_redirect_url"); exit();}
            }
         // Checks for Banned Unknown Bots
            if(!empty($good_bots)){
              foreach ($good_bots as $gbvalue){
                $pos2 = stripos($browser, $gbvalue);
                if($pos2 !== false){$gbotfound = 'found'; break;}
              }
            }
            if($gbotfound != 'found'){
              if(!empty($banned_unknown_bots)){
                foreach ($banned_unknown_bots as $bubvalue){
                  $pos3 = stripos($browser, $bubvalue);
                  if($pos3 !== false){$ubotfound = 'found'; break;}
                }
                if($ubotfound == 'found'){header("location: $banned_redirect_url"); exit();}
              }
            }
       }
    // ---------------------------------------------------------------------------------------------------------------
    ?>
    
    Code (markup):
     
    qwikad.com, Jul 27, 2017 IP
    JEET likes this.
  4. j a m i e

    j a m i e Greenhorn

    Messages:
    24
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    18
    #4
    That script already exits when the bad bot in question is detected on line #38:
    if($pos1 !== false){$botfound = 'found'; break;}
    Code (markup):
    G r e e n h o r n
    j a m i e
     
    j a m i e, Jul 27, 2017 IP
    JEET likes this.
  5. JEET

    JEET Notable Member

    Messages:
    3,487
    Likes Received:
    422
    Best Answers:
    16
    Trophy Points:
    235
    #5
    Hi, Thanks for the scripts.
    One more question. Which one of these is real bingBot?

    Last one is google. I already blocked one fake googleBot
    Is this real one? Or is this one also fake?
    I had to remove http because forum won't let me post URLs that redirect somewhere...

    40.77.167.11 27 Jul 11:59:16:pm mozilla/5.0 (compatible; bingbot/2.0; bing.com/bingbot.htm)

    207.46.13.188 28 Jul 12:08:37:am mozilla/5.0 (compatible; bingbot/2.0; bing.com/bingbot.htm)

    66.249.65.126 27 Jul 11:59:20:pm mozilla/5.0 (compatible; googlebot/2.1; google.com/bot.html)
     
    JEET, Jul 30, 2017 IP
  6. j a m i e

    j a m i e Greenhorn

    Messages:
    24
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    18
    #6
    Hi @JEET,

    Those bots (bingbot, googlebot) are good. I wouldn't block them.

    --
    G r e e n h o r n
    j a m i e
     
    j a m i e, Jul 30, 2017 IP
    JEET likes this.
  7. JEET

    JEET Notable Member

    Messages:
    3,487
    Likes Received:
    422
    Best Answers:
    16
    Trophy Points:
    235
    #7
    Thanks jamie! :) Really appreciate the help :)
     
    JEET, Aug 1, 2017 IP