Robots.txt Question

Lpe04 Peon

Messages:: 579

Likes Received:: 15

Best Answers:: 0

Trophy Points:: 0

#1

Hey there, if I want to block Google (and all other subsequent search engines) from a particular directory, can I do this?

Let's say I want www.example.com/widgets directory to be indexed, and also www.examples.com/blue but not www.example.com/blue/widgets, is this exceptable?

User-agent: *
Disallow: /blue/widgets/

The reason I ask is I can't find an example that has two directories together, and I don't want to block out www.example.com/widgets or www.examples.com/blue at all, but just the combination.

Will this work?
Thanks.

Note: Rep to whoever helps me

Lpe04, Feb 28, 2009 IP

GeorgR. Peon

Messages:: 2,831

Likes Received:: 78

Best Answers:: 0

Trophy Points:: 0

#2

as far as i can tell this should work. Should allow anything BUT the /blue/widgets/ folder.

As a double check, you could always run a sitemap, eg. from auditmypc.com and check what the sitemap reads and what it indexes.

GeorgR., Feb 28, 2009 IP

Lpe04 likes this.

longcall911 Peon

Messages:: 1,672

Likes Received:: 87

Best Answers:: 0

Trophy Points:: 0

#3

User-agent: *
Disallow: /blue/widgets/

Is correct. This command will not effect the /blue directory. It will tell bots "do not index whatever is in the /widgets folder".

But, you seem to misunderstand the robots file. It can not "block" a crawler. It simply instructs the crawler *do not index* these pages. The crawler can still access the page and analyze its content.

If you have stuff in the folder that you don't want the crawler to even see, you need to protect the folder.

/*tom*/

longcall911, Feb 28, 2009 IP

Lpe04 likes this.

Lpe04 Peon

Messages:: 579

Likes Received:: 15

Best Answers:: 0

Trophy Points:: 0

#4

Thanks tom,

I don't mind them not accessing the folder, just don't want it indexed. It's a virtual folder anyway, so no way to protect it (if I needed to). I still want the example.com/widgets folder to be indexed, just not example.com/blue/widgets

thanks.

Lpe04, Feb 28, 2009 IP

rainborick Well-Known Member

Messages:: 424

Likes Received:: 33

Best Answers:: 0

Trophy Points:: 120

#5

Your example code looks fine. In the future, you might want to check out the robots.txt tools in Google's Webmaster Tools. It will let you test robots.txt code to see if it works the way you want.

Just in case you weren't aware of this, note that blocking URLs in your robots will not remove any URLs that are already in the index. It just prevents crawling. If this situation arises for you again, the best course is to add a robots <meta> tag set to "noindex" on any page that you don't want indexed. If the page is already indexed, AND you use this <meta> tag and allow the page to be crawled in your robots.txt file, it will be removed from the index once it is crawled again.

rainborick, Feb 28, 2009 IP

Lpe04 likes this.

Lpe04 Peon

Messages:: 579

Likes Received:: 15

Best Answers:: 0

Trophy Points:: 0

#6

Thanks rainborick, that was very useful. Everyone has been repped.

Lpe04, Feb 28, 2009 IP

Lpe04 Peon

Messages:: 579

Likes Received:: 15

Best Answers:: 0

Trophy Points:: 0

#7

Sorry, just found a seperate robots.txt subforum, sorry for posting here!

Lpe04, Feb 28, 2009 IP

Log in or Sign up

Robots.txt Question

Lpe04 Peon

GeorgR. Peon

longcall911 Peon

Lpe04 Peon

rainborick Well-Known Member

Lpe04 Peon

Lpe04 Peon

Useful Searches