User-Agent: * Disallow: /license.txt/ Disallow: /readme.html Disallow: /wp-admin.php Disallow: /wp-atom.php Disallow: /wp-blog-header.php Disallow: /wp-comments-popup.php Disallow: /wp-commentsrss2.php Disallow: /wp-comments-post.php Disallow: /wp-config-sample.php Disallow: /wp-config.php Disallow: /wp-cron.php Disallow: /wp-feed.php Disallow: /wp-links-opml.php Disallow: /wp-login.php Disallow: /wp-mail.php Disallow: /wp-pass.php Disallow: /wp-rdf.php Disallow: /wp-register.php Disallow: /wp-rss.php Disallow: /wp-rss2.php Disallow: /wp-settings.php Disallow: /wp-trackback.php Disallow: /xmlrpc.php Sitemap: http://www.yoursitename.com/sitemap.xml
I read somewhere the ideal robots.txt for wordpress was; User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /category Disallow: /tag Disallow: /author Disallow: /trackback Disallow: /*trackback Disallow: /*trackback* Disallow: /*/trackback Disallow: /*?* Disallow: /*.html/$ Disallow: /*feed* # Google Image User-agent: Googlebot-Image Disallow: Allow: /* # Google AdSense User-agent: Mediapartners-Google* Disallow: Allow: /* Sitemap: http://www.yoursite.com/sitemap.xml #
The robots.txt standard does not include stars other than in the "User-agent: *" line. Many robots will parse and honour stars but many also will not. Why do you want to block so many robots anyway ? Most of the robots I get on my site belong to search engines. Every time they request a page from my site they include it in their index so my pages can end up as the result of a search. I block certain sections that don't make sense for robots to visit or that cause robots to get stuck. (I once had GoogleBot download 2GB in a month because I had a dynamically generated link that it could follow forever.) but apart from that I allow robots unfettered access to my site. I can control their behaviour at a finer level by using the robots meta tag and specifying noindex or nofollow depending on what I want them to do. There are also some tags that can let robots know which sections of an individual page should not be indexed but these are not part of a standard yet and hence every search engine supports different tags.
I dont want to get my installed files of wordpress to get indexed.It actually dilutes the bots power.So i always try to concentrate on content pages.
knorbulyon - it is good to have a robots.txt. Even if you donot disallow any files, having a robots.txt will reduce 404 errors since all bots will in case look for it.
The format of robots.txt file which you have provided for a wordpress website is quite informational. I have been looking for a robots.txt file to update it to my wordpress blog on kerala real estate named www.keralarealpro.com. Thanks again.