Hey everyone I was checking some work done by an seo company for a friend of mine. He was wondering why the PR on one of his sites went from 4 to 2 after they had been working on it for 6 months. I discovered that there was another site out there that was an exact duplicate. I asked him what this was, and he said it was that company's "test site". They left it wide open to be crawled, but assumed it wouldn't be crawled because there was no link to it. Google has no backlinks for it, but a ton of pages indexed with all but 1 being a supplemental result. Yahoo and MSN have not indexed it, not do they show any backlinks for it. I know this company screwed up and this was easy to prevent, but I am curious if anyone knows how exactly Google did find this duplicate site. Educated speculation is also welcome
I heard Google will index your website depending on your hosting company you have...... now im not sure if this is 100% true...
It may be due to the crawl caching proxy servers. Running Google services such as AdSense will get a page indexed.
The Google Toolbar will not index new pages. At least, Matt Cutts say so. http://www.mattcutts.com/blog/debunking-toolbar-doesnt-lead-to-page-being-indexed/
I registered my site - didn't tell anyone about it for a long time...then I found that there is a public website that is currently indexed by google with a page rank of 1 called whois dot ws (I can't post links yet because my account is so new). This is probably how your site got on there. I know it is how mine did. -sucks because I didn't want it to get on there until I was ready for launch.
I love your work on the Buffalo Sabres Daniel Briere. Could be a lot of things. There even could be backlinks to it, though that seems unlikely (remembering that google doesn't display all the backlinks they necessarily know about). I'm guessing the whois source someone else posted is the most likely culprit though, I've had some dusty domains show up, get indexed, attain page rank(!), and high SERPs(!!) all from that.
It could also be from a referer log. Ie: Your on the test page, then you navigate to another page on the web. This other page receives your test page URL in their logs as a hit or visit. This other website then publishes their referer hits to show where traffic is coming from, bingo you have a link. This is the exact principal the tool "PR Storm" used, where it spams thousands of known refer logs that make this data publicly available resulting in 1,000s of backlinks almost instantly and with zero work. Highly unethical, and i don't advise the use of such tools.
Some time it takes from the domain logs or from the whois record i have seen that with some sites in which the url was given in whois.something.com and in some sites i have seen that the record of domain database get crawled and got that in serps