Can I use robots.txt to block *.html? I know I can use it to block certain folders, but I also want to block certain file types.
It's done like this: Block *.html Disallow: *.html Code (markup): Block a certain folder Disallow: /cgi-bin/ Code (markup): See for more info http://www.searchengineworld.com/misc/robots_txt_crawl.htm
Curious, as in psychoanalytically? I had a bunch of spam content that's gone but I'm trying to get the robots to know it's gone. It was all *html and nothing legitimate on the site uses *html. I'm just letting it all 404 for now, I was looking for an additional way to shout at the robots to stop indexing it. It's been gone for months but I still get thousands of daily requests for it.
No. Curious as in the best way to solve your problem. If you have deleted all the html pages and replaced them with, say, php pages, you could do a series of redirects to re-route both spider requests and human visitors. At the very least, redirect all the requests for html to your new home page.