Hi, I just wanted to know how you can tell the googlebot has been to your website. Which log do I check. Also if I do a mod rewrite just for the googlebot so it doesnt pick up sessionids how I check if it is redirected to the correct link?
What web server software are you running? Under Apache, you will check your access log. Googlebot will show up in your access log as "Googlebot/2.1". This will look something like this: 66.249.64.18 - - [04/Apr/2005:14:45:18 -0600] "GET /html-special-characters.shtml HTTP/1.0" 200 23909 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)" Code (markup):
I am running on Apache... I thought so, I checked the access log There were two logs, one is a small file for the recent access and one maybe the full one which is a 120 mb so i didnt bother with that... Anything on the redirect thing?
I am not quite understanding the question. Plus, doing a mode_rewrite for Googlebot is so close to cloaking that it makes me nervous.
I use this php code in a file called passthru.php that emails me whenever Google bot has been to my site. Since the passthru code is added to every page I can track where it's been. here is the code (change the email address you want the log sent to) <?php if(eregi("googlebot",$HTTP_USER_AGENT)) { $crawl = gethostbyaddr($_SERVER["REMOTE_ADDR"]); if(eregi("64.",$REMOTE_ADDR)) { $crawler = "Refresh GoogleBot"; } if(eregi("216.",$REMOTE_ADDR)) { $crawler = "Google Deep Crawler"; } else { $crawler = "Unknown Crawler"; } if ($QUERY_STRING != "") {$url = "http://".$SERVER_NAME.$PHP_SELF.'?'.$QUERY_STRING;} else {$url = "http://".$SERVER_NAME.$PHP_SELF;} $today = date("F j, Y, g:i a"); mail("youremail@youraddress.com", "Googlebot detected on $SERVER_NAME", " $today \n Googlebot IP Address: $REMOTE_ADDR \n Googlebot Domain: $crawl \n Crawler Type: $crawler \n Url Visited: $url"); } ?> PHP: Then add this to your .htaccess on your root directory. (if you don't have a .htaccess file create one in notepad and put it on your server. here is the .htaccess code AddHandler application/x-httpd-php .htm .html <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} ^(.*).htm [NC,OR] RewriteCond %{REQUEST_FILENAME} ^(.*).html [NC] RewriteRule ^(.*) /passthru.php?file=$1 </IfModule> Code (markup): If you are not on an apache server I don't think this solution will work for you. PM me if you have any problems.
Good coding, on a good day that must really hammer on your email server.... I just have a script running that tallies visits for my forums and increments a hit counter for each bot stores it in the database, and but found a cool plugin for my blogs to track googlebotbot, date/time, frequency and what pages where it.
Zak, is your site a forum? If so, what software are you using? If not, do you have PHP enabled on your server?
If you can use PHP, there is a free PHP Counter which is pretty good at http://ekstreme.com/phplabs/phpcounter.php Tracks referring URLs, bots, etc. No database required -- uses text files.
I had a look in my access log! I get a line saying something like 68.142.249.118 - - [05/Apr/2005:09:20:07 +0000] "GET /index.php?cPath=30&osCsid=aac9484079fa4cc0f6f4164e9793e422 HTTP/1.0" 301 329 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" what does this mean
What I am trying to do is redirect googlebot, slurp and msnbot so they dont read the sessionids. I did the rewrites but dont know if they return the correct url!!!!
I always leave some milk and cookies at the doorstep.. when they are gone i know the Googlebot was here.
I know it means Yahoo slurp has been, but wot does the 301 329 "_" mean? Does this mean the redirect was good??
Ok, so there were some problems with this script. It looks like it was written by two different people. Anyway there was old and new PHP code in it. I cleaned that up and did turned on mod_rewrite and it works fine... However, I have a new problem. Anything .html pages no longer show up. I think the problem is in: AddHandler application/x-httpd-php .htm .html <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} ^(.*).htm [NC,OR] RewriteCond %{REQUEST_FILENAME} ^(.*).html [NC] RewriteRule ^(.*) /googlebot.php?file=$1 </IfModule> Code (markup): Any ideas?
301 is the result code. In this case, a permanent redirect. "-" is the referrer. What page is the visitor coming from? Since Yahoo! is trying the URL directly, no refering page. Don't know what the "329" means.