Googlebot is requesting pages that do not exist? Is this new?

Discussion in 'Google' started by eyezshine, Dec 5, 2009.

  1. #1
    I just built a few new sites about 3-4 days ago and I was checking my stats today and seen googlebot request a totally off the wall url from my new site?

    ---------------------------------------------------------
    Host: 66.249.71.206
    /tdqxojmvvyjgj.html
    Http Code: 404 Date: Dec 05 16:57:16 Http Version: HTTP/1.1 Size in Bytes: -
    Referer: -
    Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    ------------------------------------------------------------

    I personally have never seen this before?

    This is happening on a couple of the new sites I just built.

    The sites I built use mod_rewrite for pretty urls and use MySQL for their database. They are coded in PHP.

    Is this a way for googlebot to check for dynamic type websites? I searched google to see if others have experienced this also and they have but nobody seems to know why.

    Scott
     
    eyezshine, Dec 5, 2009 IP
  2. shofstetter

    shofstetter Well-Known Member

    Messages:
    178
    Likes Received:
    7
    Best Answers:
    1
    Trophy Points:
    120
    #2
    There are several possible reasons for this:

    The visitor may not have really been googlebot. There are browser plugins that allow you to change your user agent to googlebot.

    Your domain names may have been previously used and google may be looking for pages that did exist before.

    someone may have a link pointing to that page.

    I occasional get entries like that in my server log.

    you can use a 301 direct to tell google the page has moved or just let it go 404 and google should stop looking for it after a while.
     
    shofstetter, Dec 5, 2009 IP
  3. eyezshine

    eyezshine Active Member

    Messages:
    304
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    78
    #3
    Well, I checked the IP and it resolves to:

    -----------------------------------------------------
    OrgName: Google Inc.
    OrgID: GOGL
    Address: 1600 Amphitheatre Parkway
    City: Mountain View
    StateProv: CA
    PostalCode: 94043
    Country: US

    NetRange: 66.249.64.0 - 66.249.95.255
    CIDR: 66.249.64.0/19
    NetName: GOOGLE
    NetHandle: NET-66-249-64-0-1
    Parent: NET-66-0-0-0-0
    NetType: Direct Allocation
    NameServer: NS1.GOOGLE.COM
    NameServer: NS2.GOOGLE.COM
    NameServer: NS3.GOOGLE.COM
    NameServer: NS4.GOOGLE.COM
    Comment:
    RegDate: 2004-03-05
    Updated: 2007-04-10

    OrgTechHandle: ZG39-ARIN
    OrgTechName: Google Inc.
    OrgTechPhone: +1-650-318-0200
    ----------------------------------------------------

    So, it's definately a googlebot spider.

    Also, this is a brand new 5 day old "subdomain" I just created so there shouldn't be any links pointing to any pages at all except for the couple I pointed to the home page from my own other websites.

    I seriously think googlebot is testing my site for something like auto-generated MFA type scripts that automatically create pages for any keyword.

    You think google is doing this now?
     
    eyezshine, Dec 5, 2009 IP
  4. shofstetter

    shofstetter Well-Known Member

    Messages:
    178
    Likes Received:
    7
    Best Answers:
    1
    Trophy Points:
    120
    #4
    it is possible they are testing your site. Did you submit a sitemap.
    what is the url of the site where google is looking for that page?
     
    shofstetter, Dec 5, 2009 IP
  5. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #5
    It could be the result of your verifying your sites for Google Webmaster Tools. If you performed the Google site verification process for your new sites using the "Upload an HTML file" method (not the Meta Tag method), when you press the Verify button Google does 2 things:

    1) They request the HTML file with the name they gave you to determine if it exists in your web's root folder. If they get back a 200 Ok status (which normally indicates that a page was successfully found) THEN they
    2) Request a random filename on your site that they know should NOT exist.

    If the request for the random filename that should NOT exist also returns a 200 status then they know that they cannot trust the result received when they requested the verification file. Your web server in this case would be known to NOT be configured correctly to hanlde 404 Not Found errors. So your site fails verification.

    But if the request for the random filename that should NOT exist returns a 404 status then they know your web server is configured correctly to return a 404 when a page does not exist. So they will trust the fact that the verification file really DOES exist on your server and your site will pass verification.
     
    Canonical, Dec 5, 2009 IP
  6. eyezshine

    eyezshine Active Member

    Messages:
    304
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    78
    #6
    I do not use webmaster tools or sitemaps.

    I really don't know why googlebot would request some random page like that for no reason unless they are testing for autogenerated pages or something like that? Where's Matt Cutts when you need a solid answer? LOL, that's too funny...
     
    eyezshine, Dec 5, 2009 IP
  7. Bohra

    Bohra Prominent Member

    Messages:
    12,573
    Likes Received:
    537
    Best Answers:
    0
    Trophy Points:
    310
    #7
    It could be possible someone somewhere on the web is linking to that page
     
    Bohra, Dec 5, 2009 IP
  8. eyezshine

    eyezshine Active Member

    Messages:
    304
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    78
    #8
    It is possible but I'm not so sure because the site is a sub-domain that is only 5 days old and it's never been indexed in any of the search engines ever before. It's like googlebot just made up a string of letters and added .html at the end like /ahjasdkasdgasd.html

    It don't make any sense.
     
    eyezshine, Dec 5, 2009 IP