Odd, I've got a new site. Zero backlinks that I know of - and I can't find any. It was mentioned in a posting on a mailing list (but not hyperlinked). Somehow, googlebot has found the site. They've only cached the frontpage, but I don't understand how the found it in the first place. Any one any ideas htf they could find it ? Surely googlebot can't follow urls unless they're hyperlinked. And I can't believe that they check though whois data. Even yahoo has the frontpage cached. Looks like I'd better get the rest of the site finished (seo'd) before one of these damn bots decide to crawl it properly
I've had this happen before. Sometimes in 24 hours, with no backlinks or references anywhere. Started happening so much, I created a junk domain with subs where I develop client sites and password protect them. This allows me to complete them without being indexed too early, then transfer over to the real domain.
I have offline development servers that I use for the initial development work, upload, then add the finishing touches live before starting to promote the site. Looks like I'm going to have to use the dev servers more, else I'm going to end up with far too many 301's whilst I get the pagenames correct (or at least better). If I got into the habit of creating pages with the correct names in the first place it would make life so much easier
Two possibilities... 1) I don't think Google checks Whois data, but there are other companies that do and sometime post links based on that data. I've heard other people on other Web Dev boards seeing links from such sites. My guess is that those links can disapper as quickly as they come. 2) Are you using the Google Toolbar? If so and you have the PR tool turned on, the toolbar is sending every URL you visit back to Google's servers. I have had pages indexed that I was completely certain had no backlinks that were cached because of this.
Yeah Toolbar crossed my mind instantly as well. Whenever I put a site 'semi'-live for production I normally block all acces by IP AND put a no index / disallow in the robots.txt.
Well what a coincidence. Here it comes: GOOGLE CRAWLS NON-HYPERLINKED URLS! Check this out: http://www.google.co.uk/search?hl=en&lr=&sa=G&q=site:buy-a-mattress.co.uk Especially the 3d link. A link I left here on DP back when building the site and asking for ideas. It's broken with an * (w*w.....) so VB doesn't hyperlink it and still Google decided to go have a look. Assumption - jumping conculsions probably: Realizing full well how the link voting system is abused nowadays, Google has decided to factor in-text mentioing of URLs as well. Which would be odd since it could have been an article abuot how crap the page was and then Google thinks of it as a vote...