Prison Break - Internet Advertising - Swarovski - Mortgage - Loans

PDA

View Full Version : robots.txt help... trying to disallow file only from Google.


vprp
Jul 16th 2005, 5:25 pm
There are a lot of files on my forum that I don't want all search engine spiders to visit so I have them listed as:

User-agent: *
Disallow: /admincp/
Disallow: /attachments/
Disallow: /clientscript/

Additionally, there is one file that I want spidered by other spiders except for Googlebot. So I have added something like:

User-agent: googlebot
Disallow: /arcade.php

So my robots.txt file may look something like this:

User-agent: *
Disallow: /admincp/
Disallow: /attachments/
Disallow: /clientscript/

User-agent: googlebot
Disallow: /arcade.php

Does that mean that googlebot will not spider anything in /admincp, /attachments, /clientscript and arcade.php or will it only listen to what is directly specified for Googlebot? Meaning, will it only choose not to index arcade.php?

ResaleBroker
Jul 16th 2005, 5:54 pm
Does that mean that googlebot will not spider anything in /admincp, /attachments, /clientscript and arcade.php ...That is my understanding.

minstrel
Jul 18th 2005, 6:57 pm
Yes, that's correct (theoretically). All spiders which obey the robots.txt directives (the reputable ones do but not all of them do) will obey the first set of directives, including googlebot, and additionally googlebot will obey the second googlebot-specific directive.

gatordun
Jul 28th 2005, 12:58 pm
User-agent: *
User-agent: Googlebot
User-agent: Googlebot-Image
Disallow: /

minstrel
Jul 28th 2005, 6:41 pm
User-agent: *
User-agent: Googlebot
User-agent: Googlebot-Image
Disallow: /

No, that's incorrect. What that does is tell ALL spiders not to index anything (disallow everything). That wasn't what was requested and frankly I can't imagine anyone wanting a robots.txt file like that except perhaps for a private members only site.

gatordun
Jul 29th 2005, 8:44 am
Thought he was asking about google.
So i added the image google too so he could add it to his list.
The best thing to do is list all spiders that you don't want by name.
Some are good some are not.
I posted a full list of bots on one of these forums.

minstrel
Jul 29th 2005, 9:08 am
He was asking about Google.

Look at the first line of the robots.txt file you posted:

User-agent: *
followed by

Disallow: /
That instructs all spiders to disallow everything.