I am using phpbb for one site. The problem is that it generate multiple urls. One of the Urls is crap mortgagesaver.org/forum/refi-w-540-fico-vt1539.html?start=0&postdays=0&postorder=asc&highlight= Can I use robots.txt to prevent urls being indexed that have ? or highlight in the url
You can use robots.txt to prevent URL's starting with some text to be indexed. Example : User-agent: * Disallow: /forum/refi-w-540-fico-vt1539.html? Code (markup): This will prevent that URL's starting with /forum/refi-w-540-fico-vt1539.html? are indexed. This is probably not practical in your case, if you have hundeds of these URL's. Jean-Luc
There is no solution for excluding files based on url-parameters using robots.txt. To ban those you will have to change the forum code and insert the robots meta-tag into the output when the highlight-parameter is given. Something like this at the right place should do the job: if ($_GET["highlight"] != "") { echo '<meta name="robots" value="noindex,follow,noarchive">'; } Code (markup):
even I have similar prob in my new forum. But here I am experimenting something else. I have the bot indexing mod installed. (I still dont know whether it is working properly) For those urls which is already indexed with session ids, I have site wide noarchive tags. This will prevent my pages to not to go into supplemental results. I have a sitemap with all the major urls without session ids, I hope over a period of time, the site will get crawled normally.