I want to find some website ,it's robots.txt disallow google spider or other search engine spider, so google do not index his page. site: Domain.com in google.com,the result is 0. It's robots.tx is below: User-agent: * Disallow: / Could you tell me which website is like this,how can I find more website like this?
I'm not sure why you want to find them but you could always write your own spider which ignores robots.txt and program it to just index the home page of sites and scrape links to other sites and run until you have the desired number of sites. What will you be doing with the info?