Hi, so I found this neat robot.txt file on the wordpress website, what do you think? is it worth using? does it look allright to you? can u please explain the part at the bottom, why would I block adsense? digg? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /category/*/* Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /*?* Disallow: /*? Allow: /wp-content/uploads # Google Image User-agent: Googlebot-Image Disallow: Allow: /* # Google AdSense User-agent: Mediapartners-Google* Disallow: Allow: /* # Internet Archiver Wayback Machine User-agent: ia_archiver Disallow: / # digg mirror User-agent: duggmirror Disallow: / Sitemap: http://www.sandrophoto.com/sitemap.xml PHP:
It's disallowing google images and adsense, it's not disallowing digg, it's disallowing diggmirror. Diggmirror just copies and pastes your sites information onto a seperate server so it doesn't go down when the visitors start coming in. It disallows googe adsense because at the moment wordpress hosted sites don't allow advertising of any form, this is because they are (rumour) trying to break a deal with google adsense to be their one and only advertiser. It disallows google images because image traffic is largely useless and it uses up a lot of bandwidth. Hope that helps .
so basically if I have only this its fine? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /category/*/* Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /*?* Disallow: /*? Allow: /wp-content/uploads
I'd allow the category pages, but disallow the archives: Disallow: /2005/ Disallow: /2006/ Disallow: /2007/ Disallow: /2008/ Disallow: /2009/ Disallow: /2010/ Category pages can attract nice long tail search queries, that's not likely to happen with monthly archives listing posts of all categories.
Why use robots.txt for that? From my experience, Noindex is much better to avoid blog dup content problem... and it's recommended by googlers by the way. A really good post on it commented by Matt Cutts (Sphin): sphinn.com/story/8667 (look at the comment #6)