A Few Observations on Googlebot 2

Discussion in 'Google' started by Netizen, Mar 26, 2006.

  1. #1
    I anlayzed a few months of logs with a filter to include only Googlebot identified as Googlebot/2. All other visitors were excluded.

    This is a clean HTML site with no javascript (other than urchin), one css file, and no 301s.

    These are some basic observations:

    Gbot hits the domain root / every day one to four times
    Gbot hits about 10% of the inside pages everyday
    On a 10-day cycle, Gbot hits about half of the inside pages (semi-deep crawl)
    The inside page the Gbot hits the most gets no referrals from Google search
    Most paths are one file only

    Gbot comes at all times of day evenly spread over all times
    Gbot comes more on Sundays that the other days, but not a lot more

    Gbot never reads image files
    Gbot reads robots.txt at least once per day, but not every visit
    Gbot reads pdf, dmg, and reg files, but not css or exe files

    Gbot does not use a referrer
    HTTP errors are limited to 404
    Gbot keeps looking for a file called googlesyndication.com in several directories

    No magic revelations, but maybe someone can put this together with some other observation to learn something.
     
    Netizen, Mar 26, 2006 IP
    Paz and sachin410 like this.
  2. Paz

    Paz Well-Known Member

    Messages:
    587
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    170
    #2
    Interesting data... I'll definitely have to do the same experiment!!

    Cheers,
    Paz.
     
    Paz, Mar 27, 2006 IP
  3. jimkarter

    jimkarter Notable Member

    Messages:
    5,168
    Likes Received:
    347
    Best Answers:
    0
    Trophy Points:
    235
    #3
    Really interesting data. thanx netizen.
     
    jimkarter, Mar 27, 2006 IP
  4. FireStorM

    FireStorM Well-Known Member

    Messages:
    2,579
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    175
    #4
    very interesting , i do not know much about it.
     
    FireStorM, Mar 27, 2006 IP
  5. ozami

    ozami Peon

    Messages:
    89
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Well collected data. Would be good to see someone take this further and maybe create a datasite detailing the bots and their activites where people can submit their own data to as well.
     
    ozami, Mar 27, 2006 IP
  6. ozami

    ozami Peon

    Messages:
    89
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #6

    Yea that would indeed by nice actually!
     
    ozami, Mar 27, 2006 IP
  7. Aok

    Aok Peon

    Messages:
    85
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Aok, Mar 28, 2006 IP
  8. mika

    mika Active Member

    Messages:
    136
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    58
    #8
    In what way should I understand that? Is it really looking for a file that is called "googlesyndication.com"? Or is it looking for files from googlesyndication.com that are included by JavaScript (meaning it's looking for AdSense)?
     
    mika, Mar 28, 2006 IP
  9. Nikolas

    Nikolas Well-Known Member

    Messages:
    1,022
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    150
    #9
    I find out something else regarding the gbot 2.

    There are times that it grabs the same page (the observation was on my home page) several times in a time frame of about 5 minutes. I think that this is something like a 'update frequency' test.

    The time this happened there were many members in the site (the site is a forum) and they all posting like crazy. So propably the bot got this as a site that updates very frequently.

    From the next day gbot started to visiting all the site, grabbed the sitemap, and in general is very active in my site. I think that this is no coincidence.
     
    Nikolas, Mar 28, 2006 IP