Google Crawls Non-Hyperlinked URLs

Discussion in 'Google' started by T0PS3O, May 23, 2005.

  1. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #21
    Didn't they become a fully-fledged domain registrar last year? I'm not sure what that means in terms of getting hold of Verisign/Nominet databases etc. but I can see the benefit to them if they can query the global DB of domain registrations. .
     
    T0PS3O, May 26, 2005 IP
  2. Old Welsh Guy

    Old Welsh Guy Notable Member

    Messages:
    2,699
    Likes Received:
    291
    Best Answers:
    0
    Trophy Points:
    205
    #22
    Yes Tops, at the time some of us said that this was probably the only reason they became a domain registrar. As you say it makes sense to them, it also I guess allows them to keep track of known spammers websites OMG they wouldn't be THAT dirty would they? lol
     
    Old Welsh Guy, May 26, 2005 IP
  3. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #23
    And even when yuo block them in robots.txt I guess they will still drop by and have a look for internal use (document age, document change etc.).

    (OT: your sig link says Scrum V but the domain is scumv... Just mentioning it in case either is wrong...)
     
    T0PS3O, May 26, 2005 IP
  4. NetMidWest

    NetMidWest Peon

    Messages:
    1,677
    Likes Received:
    151
    Best Answers:
    0
    Trophy Points:
    0
    #24
    I registered a domain about 10 days ago, waited about 5 to attach it to a site. I put dummy nameservers in for 4, the correct for the last 1.
    Threw up a no robots meta-tagged page, robots.txt blocking all. Within an hour I had Googlebot hitting the robots.txt file. I thought it was due to the Google toolbar.

    But now I have to wonder...

    I went ahead and unblocked the page, and rank #8 for a keyword I expect to be competitive before long. Nothing more than a headline, logo, subline.
    Gotta get that site up... I am getting a few error hits from a similar domain, and curiosity seekers.
     
    NetMidWest, May 26, 2005 IP
  5. mcdar

    mcdar Peon

    Messages:
    1,831
    Likes Received:
    110
    Best Answers:
    0
    Trophy Points:
    0
    #25
    I also have an example of a site that Google has indexed but it has never had an external link pointing to it.

    This site has been in "construction" mode for a year or so (customer has not worked on it to get it off the ground). It does have content as well as outbound links.

    Oddly enough, when I just checked for this site in Google, Google reports one link to the page!

    The link is from Google itself !!!!!

    The link Google reports looks like this - F2xxlETYqFsJ:www.thesite.com. This code F2xxlETYqFsJ, is the Google checksum for the cached version of the page. However, when I attempt to look at that page, it is blank.(???)

    Very odd indeed!

    Caryl

    ps - I do not believe anyone with a Google toolbar has ever viewed the site.
     
    mcdar, May 26, 2005 IP
  6. wrkalot

    wrkalot Well-Known Member

    Messages:
    285
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    108
    #26
    I have a one page site that has been up for about three months. It has NO links pointing to it. I HAVE been to the site with a browser that has the tool bar installed AND it does show using the site: command. I guess it could be from a whois also. Do we know if google actually uses whois info at this time?
     
    wrkalot, May 26, 2005 IP
  7. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #27
    I remember this (although I'd forgotten who said it - thanks!). My recollection was that he specifically said Google could find "orphaned" pages but he wouldn't say how.

    To me the message was, if you don't want a page indexed, use a robots.txt file to specifically disallow it or use the "noindex" meta tag to do so.
     
    minstrel, May 26, 2005 IP
  8. Perrow

    Perrow Well-Known Member

    Messages:
    1,306
    Likes Received:
    78
    Best Answers:
    0
    Trophy Points:
    140
    #28
    So, not only does G put in a lot of work sandboxing sites, it work hard to find sites to sandbox :eek:

    They must really like the sandbox idea :D
     
    Perrow, May 26, 2005 IP
  9. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #29
    Yes.

    ©2005 Google - Searching 8,058,044,651 web pages

    ©2005 Google - Sandboxing 32,058,044,651 web pages
     
    T0PS3O, May 26, 2005 IP