can you check for me if this robot file is right or wrong , and if possible to explain for me why it is right or why it is wrong http://www.tripontop.com/robots.txt thanks in advance
A little over the top I think. See if you can keep it fairly simple. ----------------------------------------------- # Begin block Bad-Robots from robots.txt User-agent: asterias Disallow:/ User-agent: BotALot Disallow:/ User-agent: BuiltBotTough Disallow:/ User-agent: BunnySlippers Disallow:/ User-agent: Cegbfeieh Disallow:/ User-agent: CheeseBot Disallow:/ User-agent: CherryPicker Disallow:/ User-agent: CopyRightCheck Disallow:/ User-agent: cosmos Disallow:/ User-agent: Crescent Disallow:/ User-agent: DittoSpyder Disallow:/ User-agent: EmailCollector Disallow:/ User-agent: EmailSiphon Disallow:/ User-agent: EmailWolf Disallow:/ User-agent: EroCrawler Disallow:/ User-agent: ExtractorPro Disallow:/ User-agent: Foobot Disallow:/ User-agent: hloader Disallow:/ User-agent: httplib Disallow:/ User-agent: humanlinks Disallow:/ User-agent: InfoNaviRobot Disallow:/ User-agent: JennyBot Disallow:/ User-agent: LexiBot Disallow:/ User-agent: LinkextractorPro Disallow:/ User-agent: LinkWalker Disallow:/ User-agent: LNSpiderguy Disallow:/ User-agent: lwp-trivial Disallow:/ User-agent: MIIxpc Disallow:/ User-agent: moget Disallow:/ User-agent: NetAnts Disallow:/ User-agent: NICErsPRO Disallow:/ User-agent: Openfind Disallow:/ User-agent: ProWebWalker Disallow:/ User-agent: RepoMonkey Disallow:/ User-agent: RMA Disallow:/ User-agent: SiteSnagger Disallow:/ User-agent: SpankBot Disallow:/ User-agent: spanner Disallow:/ User-agent: suzuran Disallow:/ User-agent: Teleport Disallow:/ User-agent: TeleportPro Disallow:/ User-agent: Telesoft Disallow:/ User-agent: TheNomad Disallow:/ User-agent: TightTwatBot Disallow:/ User-agent: Titan Disallow:/ User-agent: True_Robot Disallow:/ User-agent: turingos Disallow:/ User-agent: VCI Disallow:/ User-agent: WebAuto Disallow:/ User-agent: WebBandit Disallow:/ User-agent: WebCopier Disallow:/ User-agent: WebEnhancer Disallow:/ User-agent: WebmasterWorldForumBot Disallow:/ User-agent: WebSauger Disallow:/ User-agent: WebStripper Disallow:/ User-agent: WebZip Disallow:/ User-agent: Wget Disallow:/ User-agent: WWW-Collector-E Disallow:/ User-agent: Xenu's Disallow:/ User-agent: Zeus Disallow:/ # Begin Exclusion From Directories from robots.txt User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/cache/ Disallow: /wp-content/themes/ Disallow: /wp-login.php Disallow: /wp-register.php Sitemap: http://www.tripontop.com/sitemap.xml.gz ---------------------------------------- Why people insist on adding version and build numbers ( "4.01", or, /1.0") is beyond me .. but it's been done for years and bots will blow on by without even giving version numbers a second glance. Fact of the matter is that most of these listed above, if being manipulated at all by the one that runs them, won't pay any attention to your robots.txt file anyway. Google? well, truth be known .. if you have any Google ads on your site, then blocking Mediapartners-Google or related Google ad bots won't work .. they'll come on in anyway. Oh, and unzip your sitemap.xml ... it'll give the search engines one less hoop to jump through while indexing your site. If you are really really serious about blocking site scrapers, spam-bots, and nosey-nates, then I'll suggest you do all of the blocking through your .htaccess file .. stops them cold .. guaranteed. Things change fast on the net ... and I try to keep my tools as up-to-date as possible. In order to get a robots.txt that will validate, you may want to visit our robots.txt tool here; http://www.webshoppesolutions.com/bottxt_generator.htm
I simply write User-agent: * Sitemap: http://www.myurl/sitemap.xml Is not correct? shout I disallow all the bot like above?
Sure .. your way could work providing you specify Allow or Disallow User-agent: * Disallow: / or User-agent: * Allow: / In that the "*" refers to "all" robots and parsing agents. If you want only Google to visit you, you can make an exception for Google this way; User-agent: Googlebot Allow: / User-agent: * Disallow: /
I usually use this robots.txt file and it works well for me. Sitemap: http://www.domain.com/sitemap.xml User-Agent: * Allow: /
Sherone Reference to this http://www.fleiner.com/bots/ I think you should disallow some of them and dont allow all of the robots
Guessing here but isn't it shorter to Allow all the browser user-agents instead of blocking the zillion spiders ?