How can i find those website in which robots.txt disallow google spider?

jiangnancun Active Member

Messages:

1

Likes Received:

0

Best Answers:

0

Trophy Points:

86

#1

I want to find some website ，it's robots.txt disallow google spider or other search engine spider, so google do not index his page.
site: Domain.com in google.com,the result is 0.

It's robots.tx is below:

User-agent: *
Disallow: /

Could you tell me which website is like this,how can I find more website like this?

jiangnancun, May 13, 2014 IP
sarahk iTamer Staff

Messages:

28,901

Likes Received:

4,555

Best Answers:

123

Trophy Points:

665

#2

I'm not sure why you want to find them but you could always write your own spider which ignores robots.txt and program it to just index the home page of sites and scrape links to other sites and run until you have the desired number of sites.

What will you be doing with the info?

► PayPal and the negative balance
► Cabin Hire Prices
► My insta @itamernz

sarahk, May 13, 2014 IP
neroux Active Member

Messages:

566

Likes Received:

8

Best Answers:

0

Trophy Points:

60

#3

sarahk said: ↑

What will you be doing with the info?
Click to expand...

Same question.

neroux, May 13, 2014 IP