my robots.txt is User-agent: * Allow: / Disallow: /help/ Disallow: /search/ Disallow: /stats/ Disallow: /calendar/ Disallow: /reminder/ Disallow: /login/ User-agent: Baiduspider User-agent: Baiduspider-video User-agent: Baiduspider-image Disallow: / It looks like baidu is ignoring robots.txt and still coming to my forum. i have 10-20 visitors and over 50 baidu IPs. I don't want it, i don't want visitors from china anyway? How to stop baidu spider? Is my robots.txt right? tahnks
I think your syntax is wrong (you don't normally have to use "Allow") - try: User-agent: * Disallow: /help/ Disallow: /search/ Disallow: /stats/ Disallow: /calendar/ Disallow: /reminder/ Disallow: /login/ User-agent: Baiduspider Disallow: / User-agent: Baiduspider-video Disallow: / User-agent: Baiduspider-image Disallow: / Code (markup): BTW There are actually more Baidu uses search engine spiders/bots to crawl different types of content (and you'd have to block them all, if required): Baiduspider-image crawls images Baiduspider-mobile crawls mobile search content Baiduspider-video crawls videos Baiduspider-news crawls news content Baiduspider-favo crawls bookmarks Baiduspider-sfkr crawls Baidu PPC/ads Baiduspider-cpro crawls Baidu’s contextual advertising network If that doesn't work, you'll have to block Baidu via htacess, or on server-level if you have admin privileges. (Just Google it) Here's a good resource: http://www.robotstxt.org
Or just "don't sweat the small stuff" Does it really matter? Your site will be indexed weekly by hundreds of spiders from sites you won't be able to identify easily (I once had a site documenting them). You will get a better result for your business if you focus on productive tasks and ignore the rogue spiders.
actually the bot ur facing problem from is not exactly Baidu its just fooling people by saying its Baidu its a forum crawler which is collecting all ur data