View Full Version : Can I use robots.txt to block *.html?
briandunning
Nov 2nd 2005, 10:59 am
Can I use robots.txt to block *.html? I know I can use it to block certain folders, but I also want to block certain file types.
WhatiFind
Nov 2nd 2005, 11:04 am
It's done like this:
Block *.html
Disallow: *.html
Block a certain folder
Disallow: /cgi-bin/
See for more info http://www.searchengineworld.com/misc/robots_txt_crawl.htm
minstrel
Nov 5th 2005, 1:07 pm
Why do you want to do this, Brian? I'm curious...
briandunning
Nov 5th 2005, 4:47 pm
Curious, as in psychoanalytically? :)
I had a bunch of spam content that's gone but I'm trying to get the robots to know it's gone. It was all *html and nothing legitimate on the site uses *html. I'm just letting it all 404 for now, I was looking for an additional way to shout at the robots to stop indexing it. It's been gone for months but I still get thousands of daily requests for it.
minstrel
Nov 5th 2005, 10:51 pm
No. Curious as in the best way to solve your problem. If you have deleted all the html pages and replaced them with, say, php pages, you could do a series of redirects to re-route both spider requests and human visitors. At the very least, redirect all the requests for html to your new home page.
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.