Robots.txt

Discussion in 'Search Engine Optimization' started by Dudibob, Mar 9, 2007.

  1. #1
    I'm trying to exclude some pages from the search engines a funky website makes and just need a bit of clarification to sort it.

    I can't use Robot meta tags to stop the SE's as the site seems to make 2 pages of everything where page one is page-name.html and the second one is page-name.html?id=kwio[egfio[wnensaojfablah or something funky like that.

    so if I wanted to ban both pages (or all versions of the page) say for a search page (search-page.html and search-page.html?id=etc), the robots.txt will look like this:

    User-Agent: *
    Disallow: search-page.html

    That will ban all versions of the above page won't it?

    and the second one if I want to ban the ?id= version of pages:

    User-Agent: *
    Disallow page-name.html?id=

    Would that be it? I've been trying to use Google's sitemap robots checker but I get all clear even on a disallow: / so I'm not trusting it :s
     
    Dudibob, Mar 9, 2007 IP
  2. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I think that if you disallow file.html that will disallow file.html?id=123 as well

    You would be best using php or htaccess to 301 the wrong version to the right version.
     
    mad4, Mar 9, 2007 IP
    Dudibob likes this.
  3. Dudibob

    Dudibob Peon

    Messages:
    618
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #3
    oh yeah! 301's, I almost forgot, that's the same thought I had for the robots.txt cheers very much mad4 :)
     
    Dudibob, Mar 9, 2007 IP