Can I use robots.txt to block *.html?

Discussion in 'robots.txt' started by briandunning, Nov 2, 2005.

  1. #1
    Can I use robots.txt to block *.html? I know I can use it to block certain folders, but I also want to block certain file types.
     
    briandunning, Nov 2, 2005 IP
  2. WhatiFind

    WhatiFind offline

    Messages:
    1,789
    Likes Received:
    257
    Best Answers:
    0
    Trophy Points:
    180
    #2
    WhatiFind, Nov 2, 2005 IP
  3. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #3
    Why do you want to do this, Brian? I'm curious...
     
    minstrel, Nov 5, 2005 IP
  4. briandunning

    briandunning Active Member

    Messages:
    262
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    98
    #4
    Curious, as in psychoanalytically? :)

    I had a bunch of spam content that's gone but I'm trying to get the robots to know it's gone. It was all *html and nothing legitimate on the site uses *html. I'm just letting it all 404 for now, I was looking for an additional way to shout at the robots to stop indexing it. It's been gone for months but I still get thousands of daily requests for it.
     
    briandunning, Nov 5, 2005 IP
  5. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #5
    No. Curious as in the best way to solve your problem. If you have deleted all the html pages and replaced them with, say, php pages, you could do a series of redirects to re-route both spider requests and human visitors. At the very least, redirect all the requests for html to your new home page.
     
    minstrel, Nov 5, 2005 IP