I am curious to see some example uses and implementations of robots.txt.. specifically implementations and the reasons behind them for increasing SEO. The robots.txt I am using is based on the example: SEO with robots.txt WordPress 2.1 robots.txt User-agent: * # disallow all files in these directories Disallow: /cgi-bin/ Disallow: /admin/ Disallow: /comments/ Disallow: /z/j/ Disallow: /z/c/ Disallow: /about/legal-notice/ Disallow: /about/copyright-policy/ Disallow: /about/terms-and-conditions/ Disallow: /about/feed/ Disallow: /about/trackback/ Disallow: /contact/ Disallow: /stats* Disallow: /tag Disallow: /category/uncategorized* # disallow all files ending with these extensions Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.txt$ # disallow all files in /wp- directorys Disallow: /wp-*/ # disallow all files with? in url Disallow: /*? Code (markup): Basically this helps get rid of duplicate content, low-quality content, css, javascript, php, etc.. but does allow search engines to read the articles, find images, find pdfs, etc. Anyone else have improvements or other robots.txt examples?
Thanks for that example, robots.txt is one of the SEO fields I struggle with. I will look into using some of these.
I also use a custom robots.txt for phpBB phpBB robots.txt User-agent: * # disallow all files with a? in url Disallow: /*?* # disallow all files ending in specific extension Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.txt$ # disallow these dirs Disallow: /js/ Disallow: /css/ Disallow: /cgi-bin/ Disallow: /db/ Disallow: /admin/ Disallow: /cache/ Disallow: /includes/ Disallow: /templates/ # disallow these files and dirs Disallow: /V Disallow: /stats* Disallow: /post Disallow: /member Disallow: /mx_ # disallow these urls Disallow: /rss.php Disallow: /viewtopic.php Disallow: /viewforum.php Disallow: /index.php? Disallow: /posting.php Disallow: /groupcp.php Disallow: /search.php Disallow: /login.php Disallow: /profile.php Disallow: /memberlist.php Disallow: /faq.php Disallow: /common.php Disallow: /index.php Disallow: /memberlist.php Disallow: /modcp.php Disallow: /privmsg.php Disallow: /viewonline.php # disallow urls starting with quote Disallow: /" Code (markup): but this phpBB forum is different than the default because it has special optimizations already.
You should only use wild cards on bots that support it (i.e. GoogleBot), using an appropriate UserAgent filter. It's probably not worth trying to make it too complicated - the search engines will filter out the duplicate and low quality content anyway. Cryo.
Normal phpBB robots.txt # # robots.txt for www.phpbbhacks.com # User-agent: * Disallow: /forums/viewtopic.php Disallow: /forums/viewforum.php Disallow: /forums/index.php? Disallow: /forums/posting.php Disallow: /forums/groupcp.php Disallow: /forums/search.php Disallow: /forums/login.php Disallow: /forums/privmsg.php Disallow: /forums/post Disallow: /forums/profile.php Disallow: /forums/memberlist.php Disallow: /forums/faq.php Disallow: /forums/archive Code (markup):