hi all, i have a site with more webpages (duplication content issues) to block than to allow. is there a way in robots.txt to achieve this? I do know how to block pages from being crawled, but since I have more to block than allow, I was thinking it is probably easier to do the opposite. thanks in advance
You cannot block with robots.txt robots.txt is just plaint text that give information to spiders visiting your website, what directory or pages you dont want the spider to index. Please remember that not all spiders will obey robots.txt. Especially spam and harvest bots. You can try NiceStat.com It not just tracks bots visiting your sites but also you can ban them with the rules you set. You can try demo and look under Website Rules