block all except in robots.txt

Discussion in 'robots.txt' started by visualscope, May 26, 2007.

  1. #1
    hi all,
    i have a site with more webpages (duplication content issues) to block than to allow.
    is there a way in robots.txt to achieve this?

    I do know how to block pages from being crawled, but since I have more to block than allow, I was thinking it is probably easier to do the opposite.

    thanks in advance
     
    visualscope, May 26, 2007 IP
  2. tinkerbox

    tinkerbox Peon

    Messages:
    55
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You cannot block with robots.txt
    robots.txt is just plaint text that give information to spiders visiting your website, what directory or pages you dont want the spider to index.
    Please remember that not all spiders will obey robots.txt. Especially spam and harvest bots.

    You can try NiceStat.com
    It not just tracks bots visiting your sites but also you can ban them with the rules you set. You can try demo and look under Website Rules
     
    tinkerbox, May 27, 2007 IP