I'm trying to exclude some pages from the search engines a funky website makes and just need a bit of clarification to sort it. I can't use Robot meta tags to stop the SE's as the site seems to make 2 pages of everything where page one is page-name.html and the second one is page-name.html?id=kwio[egfio[wnensaojfablah or something funky like that. so if I wanted to ban both pages (or all versions of the page) say for a search page (search-page.html and search-page.html?id=etc), the robots.txt will look like this: User-Agent: * Disallow: search-page.html That will ban all versions of the above page won't it? and the second one if I want to ban the ?id= version of pages: User-Agent: * Disallow page-name.html?id= Would that be it? I've been trying to use Google's sitemap robots checker but I get all clear even on a disallow: / so I'm not trusting it :s
I think that if you disallow file.html that will disallow file.html?id=123 as well You would be best using php or htaccess to 301 the wrong version to the right version.
oh yeah! 301's, I almost forgot, that's the same thought I had for the robots.txt cheers very much mad4