<?php error_reporting(0); $email = "receivereport@yourdomain.com"; if(eregi("googlebot",$_SERVER['HTTP_USER_AGENT'])) { mail($email, "Googlebot at yourdomain.com", "Google has indexed : yourdomain.com"); } ?> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head>
This is commented in a way I hope is self-explanatory. I haven't tested it much--just cooked it up ad hoc and got the typos out--but it should do. Anyway, it will doubtless suggest other answers. It will work for shtml files; for php scripts, it can just be included'ed in, with PHP_SELF used instead of the variable $who. <?php // tracker.php - Bot Logger /* Call this script from an shtml file with-- <!--#include virtual="/path_to_file_from_your_root/tester.php?who=filename.shtml" --> --where path_to_file_from_your_root is to be set to what it says, and filename.shtml is the name of the file holding the include. Name the $logfile below as you please, but--if it isn't in the same directory as the php script, provide a pathspec. The file Bots.List is to be a list of the User Agents that you want reported in the log; in that list, use only the barest minimum necessary to identify the bot (like google, or mediabot--case is immaterial). If that file is not in the same directory as the PHP script, include its relative path from the script's directory. */ // "Constants": // General: $crlf=chr(13).chr(10); // Particular: $logfile='Bots.Log'; // Bot List: $list=file('Bots.List'); if count($list==0) exit; // don't watse time on an empty list! // Log Call: // Get data: $address=trim($_SERVER['REMOTE_ADDR']); if ($address==NULL) $address='<unspecified address>'; $agent=trim($_SERVER['HTTP_USER_AGENT']); foreach ($list as $bot) { if (stristr($agent,$bot)!==FALSE) { $msg=$agent.' from '.$address.' visited '.$_SERVER["PHP_SELF"]' on '.date("D, d M Y, H:i:s").$crlf; $lhandle=@fopen($logfile,'a'); @fwrite($lhandle,$msg.$crlf); @fclose($lhandle); break; } } ?> PHP:
I will test both and then post here which did the best overall results... for the second I guess the Bots.List will have something like this in it: googlebot etc right? if not... then what?
Yup, that's it. I don't think you'll find one "better" than another, as they do different things: one sends you an email, the other logs the occurrence. But I suspect that unless you are only concerned with your front page, or have a very, very small site, the emails method will get tiresome rather quickly. Even the logging, if you have a good-size site, will fill up a file pretty quick.