If you've ever wanted to know when search engines like Google crawl your site with out geting E-mail bombed or doing grep, try this perl add-on if you use SSI on your sites. Add this code to your perl script. $database = "/complete_path/site90/html/logs/logs.txt"; $domain = "http://www.domain.com"; $shortdate = `date +"%D %T %Z"`; chop ($shortdate); if ($ENV{'HTTP_USER_AGENT'} =~ /google|msn|yahoo/i) { open (DATABASE,">>$database"); print DATABASE "$ENV{'REMOTE_ADDR'} - $ENV{'HTTP_USER_AGENT'} - $domain$ENV{'REQUEST_URI'} - $shortdate\n"; close(DATABASE); } Code (markup): and create a logs/logs.txt where the database points to. This code will log Google, MSN, and Yahoo on any domain on the same server. Change http-//www.domain.com to the domain that the script is on, or you can make it blank if you only want to log one domain and don't want the http-//www.domain.com part to show up on the log. To only log a certian search engine, use one of these lines. if ($ENV{'HTTP_USER_AGENT'} =~ /google/i) { if ($ENV{'HTTP_USER_AGENT'} =~ /msn/i) { if ($ENV{'HTTP_USER_AGENT'} =~ /yahoo/i) { Or for two search engines.... if ($ENV{'HTTP_USER_AGENT'} =~ /name|name/i) { Example of log. If you make a new site, submit it to Yahoo and MSN. I submited five new sites to them just two days ago, and as the log shows, Yahoo's allready doing some nice crawling on one of them, while the other two bots looked and then left! If you don't get any thing and want to make sure it's working, replace if ($ENV{'HTTP_USER_AGENT'} =~ /google|msn|yahoo/i) { open (DATABASE,">>$database"); print DATABASE "$ENV{'REMOTE_ADDR'} - $ENV{'HTTP_USER_AGENT'} - $domain$ENV{'REQUEST_URI'} - $shortdate\n"; close(DATABASE); } Code (markup): with open (DATABASE,">>$database"); print DATABASE "$ENV{'REMOTE_ADDR'} - $ENV{'HTTP_USER_AGENT'} - $domain$ENV{'REQUEST_URI'} - $shortdate\n"; close(DATABASE); Code (markup): That will log everything. Then change it back after you see logs show up!
interesting post, I feel like there must be an easier way to do this though. Maybe an entry in the httpd.conf file or something..
If placing some code in a CGI file, geting the two lines right, and making the text file is not easy, then nothing is!!!! To empty the logs.txt file, in the same directory as the log.txt file, create a log.php file with <? $file = fopen("logs.txt","w"); fclose($file); echo"File empty"; ?> Code (markup): Go to that file and the log file will be emptied.