Use Robots.txt to block spiders to pages with certain parameters??

Discussion in 'Google' started by adzeds, Jan 4, 2011.

  1. #1
    Is it possible to use a robots.txt file to block search engine spiders from crawling a pages with certain parameters??

    For example if my site had a page:

    http://mysite.com?custom_pages=1

    Could I make sure the robots dont crawl any pages with the ?custom_pages parameter??

    Any help much appreciated!
     
    adzeds, Jan 4, 2011 IP
  2. v3t0

    v3t0 Member

    Messages:
    35
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #2
    here is the code

    User-agent: *
    Disallow: /folder or pages
    Code (markup):
     
    v3t0, Jan 4, 2011 IP
  3. adzeds

    adzeds Well-Known Member

    Messages:
    1,209
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    100
    #3
    but can I use the to do parameters?

    Example:
    User-agent: *
    Disallow: ?custom_pages
     
    adzeds, Jan 4, 2011 IP
  4. longcall911

    longcall911 Peon

    Messages:
    1,672
    Likes Received:
    87
    Best Answers:
    0
    Trophy Points:
    0
    #4
    you would have to list each possible parameter:

    custom_pages=1
    custom_pages=2
    custom_pages=3

    because each is seen as a different page.

    Also, robots.txt does not block content. It does not stop the page from being crawled. It just tells the crawler not to index the page.

    BTW: you could put your custom_pages in a separate folder and disallow the folder.
     
    longcall911, Jan 4, 2011 IP
  5. adzeds

    adzeds Well-Known Member

    Messages:
    1,209
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    100
    #5
    @longcall911
    Yeah, that is what I am after. Got some pages that will appear to be duplicate content due to having a different URL structure so want to manipulate what gets indexed.

    Just have to find an easy why to find all the parameters that I want blocked from our 1,950,000 pages! Ouch!
     
    adzeds, Jan 4, 2011 IP
  6. polars

    polars Peon

    Messages:
    181
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    What ever you want any block pages or folder then you can use robots.txt file.
     
    polars, Jan 4, 2011 IP
  7. Andrew Robertson

    Andrew Robertson Peon

    Messages:
    10
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Do you need the file on your site root only, or in every dir?
     
    Andrew Robertson, Jan 4, 2011 IP