1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Robots.txt Question

Discussion in 'Google' started by Lpe04, Feb 28, 2009.

  1. #1
    Hey there, if I want to block Google (and all other subsequent search engines) from a particular directory, can I do this?

    Let's say I want www.example.com/widgets directory to be indexed, and also www.examples.com/blue but not www.example.com/blue/widgets, is this exceptable?

    User-agent: *
    Disallow: /blue/widgets/

    The reason I ask is I can't find an example that has two directories together, and I don't want to block out www.example.com/widgets or www.examples.com/blue at all, but just the combination.

    Will this work?
    Thanks.

    Note: Rep to whoever helps me ;)
     
    Lpe04, Feb 28, 2009 IP
  2. GeorgR.

    GeorgR. Peon

    Messages:
    2,831
    Likes Received:
    78
    Best Answers:
    0
    Trophy Points:
    0
    #2
    as far as i can tell this should work. Should allow anything BUT the /blue/widgets/ folder.

    As a double check, you could always run a sitemap, eg. from auditmypc.com and check what the sitemap reads and what it indexes.
     
    GeorgR., Feb 28, 2009 IP
    Lpe04 likes this.
  3. longcall911

    longcall911 Peon

    Messages:
    1,672
    Likes Received:
    87
    Best Answers:
    0
    Trophy Points:
    0
    #3
    User-agent: *
    Disallow: /blue/widgets/

    Is correct. This command will not effect the /blue directory. It will tell bots "do not index whatever is in the /widgets folder".

    But, you seem to misunderstand the robots file. It can not "block" a crawler. It simply instructs the crawler *do not index* these pages. The crawler can still access the page and analyze its content.

    If you have stuff in the folder that you don't want the crawler to even see, you need to protect the folder.

    /*tom*/
     
    longcall911, Feb 28, 2009 IP
    Lpe04 likes this.
  4. Lpe04

    Lpe04 Peon

    Messages:
    579
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks tom,

    I don't mind them not accessing the folder, just don't want it indexed. It's a virtual folder anyway, so no way to protect it (if I needed to). I still want the example.com/widgets folder to be indexed, just not example.com/blue/widgets

    thanks.
     
    Lpe04, Feb 28, 2009 IP
  5. rainborick

    rainborick Well-Known Member

    Messages:
    424
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    120
    #5
    Your example code looks fine. In the future, you might want to check out the robots.txt tools in Google's Webmaster Tools. It will let you test robots.txt code to see if it works the way you want.

    Just in case you weren't aware of this, note that blocking URLs in your robots will not remove any URLs that are already in the index. It just prevents crawling. If this situation arises for you again, the best course is to add a robots <meta> tag set to "noindex" on any page that you don't want indexed. If the page is already indexed, AND you use this <meta> tag and allow the page to be crawled in your robots.txt file, it will be removed from the index once it is crawled again.
     
    rainborick, Feb 28, 2009 IP
    Lpe04 likes this.
  6. Lpe04

    Lpe04 Peon

    Messages:
    579
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Thanks rainborick, that was very useful. Everyone has been repped.
     
    Lpe04, Feb 28, 2009 IP
  7. Lpe04

    Lpe04 Peon

    Messages:
    579
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Sorry, just found a seperate robots.txt subforum, sorry for posting here!
     
    Lpe04, Feb 28, 2009 IP