How do I know the googlebot has been?

Discussion in 'Google' started by zak, Apr 4, 2005.

  1. #1
    Hi,

    I just wanted to know how you can tell the googlebot has been to your website. Which log do I check.

    Also if I do a mod rewrite just for the googlebot so it doesnt pick up sessionids how I check if it is redirected to the correct link?
     
    zak, Apr 4, 2005 IP
  2. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #2
    What web server software are you running?

    Under Apache, you will check your access log.

    Googlebot will show up in your access log as "Googlebot/2.1".

    This will look something like this:

    66.249.64.18 - - [04/Apr/2005:14:45:18 -0600] "GET /html-special-characters.shtml HTTP/1.0" 200 23909 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
    Code (markup):
     
    Will.Spencer, Apr 4, 2005 IP
  3. zak

    zak Peon

    Messages:
    175
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I am running on Apache...

    I thought so,

    I checked the access log

    There were two logs, one is a small file for the recent access and one maybe the full one which is a 120 mb so i didnt bother with that...

    Anything on the redirect thing?
     
    zak, Apr 4, 2005 IP
  4. Will.Spencer

    Will.Spencer NetBuilder

    Messages:
    14,789
    Likes Received:
    1,040
    Best Answers:
    0
    Trophy Points:
    375
    #4
    I am not quite understanding the question.

    Plus, doing a mode_rewrite for Googlebot is so close to cloaking that it makes me nervous.
     
    Will.Spencer, Apr 4, 2005 IP
  5. kyle422

    kyle422 Peon

    Messages:
    290
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I use this php code in a file called passthru.php that emails me whenever Google bot has been to my site. Since the passthru code is added to every page I can track where it's been.
    here is the code (change the email address you want the log sent to)
    <?php
    
    if(eregi("googlebot",$HTTP_USER_AGENT))
    
        {
    
    $crawl = gethostbyaddr($_SERVER["REMOTE_ADDR"]); 
    
    if(eregi("64.",$REMOTE_ADDR))
    
    { $crawler = "Refresh GoogleBot"; }
    
    if(eregi("216.",$REMOTE_ADDR))
    
    { $crawler = "Google Deep Crawler"; }
    
    else
    
    { $crawler = "Unknown Crawler"; }
    
    
    
    if ($QUERY_STRING != "") 
    
    {$url = "http://".$SERVER_NAME.$PHP_SELF.'?'.$QUERY_STRING;} 
    
    else 
    
    {$url = "http://".$SERVER_NAME.$PHP_SELF;} 
    
    $today = date("F j, Y, g:i a"); 
    
    mail("youremail@youraddress.com", "Googlebot detected on $SERVER_NAME", " $today \n Googlebot IP Address: $REMOTE_ADDR \n Googlebot Domain: $crawl \n Crawler Type: $crawler \n Url Visited: $url"); 
    
    }
    ?>
    
    PHP:
    Then add this to your .htaccess on your root directory. (if you don't have a .htaccess file create one in notepad and put it on your server.
    here is the .htaccess code
    AddHandler application/x-httpd-php .htm .html
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} ^(.*).htm [NC,OR]
    RewriteCond %{REQUEST_FILENAME} ^(.*).html [NC]
    RewriteRule ^(.*) /passthru.php?file=$1
    </IfModule>
    Code (markup):
    If you are not on an apache server I don't think this solution will work for you. PM me if you have any problems.
     
    kyle422, Apr 4, 2005 IP
  6. dakar

    dakar Active Member

    Messages:
    203
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    83
    #6
    Good coding, on a good day that must really hammer on your email server....

    I just have a script running that tallies visits for my forums and increments a hit counter for each bot stores it in the database, and but found a cool plugin for my blogs to track googlebotbot, date/time, frequency and what pages where it.
     
    dakar, Apr 5, 2005 IP
  7. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #7
    Zak, is your site a forum? If so, what software are you using? If not, do you have PHP enabled on your server?
     
    minstrel, Apr 5, 2005 IP
  8. kyle422

    kyle422 Peon

    Messages:
    290
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I have it going to my gmail account, so no worries :)
     
    kyle422, Apr 5, 2005 IP
  9. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #9
    minstrel, Apr 5, 2005 IP
  10. zak

    zak Peon

    Messages:
    175
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I had a look in my access log!

    I get a line saying something like

    68.142.249.118 - - [05/Apr/2005:09:20:07 +0000] "GET /index.php?cPath=30&osCsid=aac9484079fa4cc0f6f4164e9793e422 HTTP/1.0" 301 329 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

    what does this mean
     
    zak, Apr 5, 2005 IP
  11. zak

    zak Peon

    Messages:
    175
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #11
    What I am trying to do is redirect googlebot, slurp and msnbot so they dont read the sessionids.

    I did the rewrites but dont know if they return the correct url!!!!
     
    zak, Apr 5, 2005 IP
  12. zak

    zak Peon

    Messages:
    175
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Its ok I think, I've tested it!
     
    zak, Apr 5, 2005 IP
  13. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #13
    It means you've been visited by Yahoo! Slurp...
     
    minstrel, Apr 5, 2005 IP
  14. LiGhTen

    LiGhTen Peon

    Messages:
    89
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Just wanted to say thanks for the script I am also looking for an ASP/.NET ver
     
    LiGhTen, Apr 6, 2005 IP
  15. crazyhorse

    crazyhorse Peon

    Messages:
    1,137
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I always leave some milk and cookies at the doorstep.. when they are gone i know the Googlebot was here. ;)
     
    crazyhorse, Apr 6, 2005 IP
    minstrel likes this.
  16. zak

    zak Peon

    Messages:
    175
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #16
    I know it means Yahoo slurp has been, but wot does the 301 329 "_" mean? Does this mean the redirect was good??
     
    zak, Apr 6, 2005 IP
  17. Mia

    Mia R.I.P. STEVE JOBS

    Messages:
    23,694
    Likes Received:
    1,167
    Best Answers:
    0
    Trophy Points:
    440
    #17
    Ok, so there were some problems with this script. It looks like it was written by two different people. Anyway there was old and new PHP code in it. I cleaned that up and did turned on mod_rewrite and it works fine... However, I have a new problem.

    Anything .html pages no longer show up. I think the problem is in:

    AddHandler application/x-httpd-php .htm .html
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} ^(.*).htm [NC,OR]
    RewriteCond %{REQUEST_FILENAME} ^(.*).html [NC]
    RewriteRule ^(.*) /googlebot.php?file=$1
    </IfModule>
    
    
    Code (markup):
    Any ideas?
     
    Mia, Apr 8, 2005 IP
  18. jeremymgp

    jeremymgp Active Member

    Messages:
    216
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    58
    #18
    Check your log files and look for "Googlebot"
     
    jeremymgp, Apr 11, 2005 IP
  19. MyPages

    MyPages Active Member

    Messages:
    30
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    91
    #19
    301 is the result code. In this case, a permanent redirect.

    "-" is the referrer. What page is the visitor coming from? Since Yahoo! is trying the URL directly, no refering page.

    Don't know what the "329" means.
     
    MyPages, Apr 11, 2005 IP
  20. inverse

    inverse Banned

    Messages:
    73
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #20
    check ur stats provided by hosting
     
    inverse, Apr 13, 2005 IP