Bots tracking

Discussion in 'Site & Server Administration' started by phrozen_ra, Jan 14, 2005.

  1. #1
    Anyone has a good PHP script for bot tracking... specially SE bots?
     
    phrozen_ra, Jan 14, 2005 IP
  2. yoook

    yoook Peon

    Messages:
    83
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    <?php
    error_reporting(0);
    $email = "receivereport@yourdomain.com";
    if(eregi("googlebot",$_SERVER['HTTP_USER_AGENT']))
    {
    mail($email, "Googlebot at yourdomain.com",
    "Google has indexed : yourdomain.com");
    }
    ?>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    <html>
    <head>
     
    yoook, Jan 14, 2005 IP
  3. Owlcroft

    Owlcroft Peon

    Messages:
    645
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #3
    This is commented in a way I hope is self-explanatory. I haven't tested it much--just cooked it up ad hoc and got the typos out--but it should do. Anyway, it will doubtless suggest other answers.

    It will work for shtml files; for php scripts, it can just be included'ed in, with PHP_SELF used instead of the variable $who.

    <?php
    
    //   tracker.php - Bot Logger 
    
    /*
    
    Call this script from an shtml file with--
    
    <!--#include virtual="/path_to_file_from_your_root/tester.php?who=filename.shtml" -->
    
    --where  path_to_file_from_your_root  is to be set to what it says, and  filename.shtml  is the name of the file holding the include.
    
    Name the $logfile below as you please, but--if it isn't in the same directory as the php script, provide a pathspec.
    
    The file Bots.List is to be a list of the User Agents that you want reported in the log; in that list, use only the barest minimum necessary to identify the bot (like google, or mediabot--case is immaterial).  If that file is not in the same directory as the PHP script, include its relative path from the script's directory.
    
    */
    
    
      //   "Constants":
    
      //     General:
      $crlf=chr(13).chr(10);
    
      //     Particular:
      $logfile='Bots.Log';
    
    
      //   Bot List:
      $list=file('Bots.List');
      if count($list==0) exit;  // don't watse time on an empty list!
    
    
      //   Log Call:
    
      //     Get data:
      $address=trim($_SERVER['REMOTE_ADDR']);
      if ($address==NULL) $address='<unspecified address>';
      $agent=trim($_SERVER['HTTP_USER_AGENT']);
      foreach ($list as $bot)
      {
        if (stristr($agent,$bot)!==FALSE)
        {
          $msg=$agent.' from '.$address.' visited '.$_SERVER["PHP_SELF"]' on '.date("D, d M Y, H:i:s").$crlf;
          $lhandle=@fopen($logfile,'a');
          @fwrite($lhandle,$msg.$crlf);
          @fclose($lhandle);
          break;
        }
      }
    
    
    ?>
    PHP:
     
    Owlcroft, Jan 14, 2005 IP
  4. phrozen_ra

    phrozen_ra Peon

    Messages:
    147
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I will test both and then post here which did the best overall results...

    for the second I guess the Bots.List will have something like this in it:
    googlebot
    etc right?

    if not... then what?
     
    phrozen_ra, Jan 14, 2005 IP
  5. Owlcroft

    Owlcroft Peon

    Messages:
    645
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Yup, that's it.

    I don't think you'll find one "better" than another, as they do different things: one sends you an email, the other logs the occurrence. But I suspect that unless you are only concerned with your front page, or have a very, very small site, the emails method will get tiresome rather quickly.

    Even the logging, if you have a good-size site, will fill up a file pretty quick.
     
    Owlcroft, Jan 15, 2005 IP