robots.txt and duplicate content of wordpress

sandrodz Peon

Messages:: 1,482

Likes Received:: 29

Best Answers:: 0

Trophy Points:: 0

#1

Hi, so I found this neat robot.txt file on the wordpress website, what do you think? is it worth using? does it look allright to you? can u please explain the part at the bottom, why would I block adsense? digg?

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

Sitemap: http://www.sandrophoto.com/sitemap.xml

PHP:

sandrodz, Nov 18, 2007 IP

Chewyshoe Peon

Messages:: 401

Likes Received:: 24

Best Answers:: 0

Trophy Points:: 0

#2

sandrodz said: ↑
Hi, so I found this neat robot.txt file on the wordpress website, what do you think? is it worth using? does it look allright to you? can u please explain the part at the bottom, why would I block adsense? digg?
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

Sitemap: http://www.sandrophoto.com/sitemap.xml
PHP:
Click to expand...
It's disallowing google images and adsense, it's not disallowing digg, it's disallowing diggmirror. Diggmirror just copies and pastes your sites information onto a seperate server so it doesn't go down when the visitors start coming in.

It disallows googe adsense because at the moment wordpress hosted sites don't allow advertising of any form, this is because they are (rumour) trying to break a deal with google adsense to be their one and only advertiser.

It disallows google images because image traffic is largely useless and it uses up a lot of bandwidth.

Hope that helps .

Chewyshoe, Nov 18, 2007 IP

sandrodz likes this.

sandrodz Peon

Messages:: 1,482

Likes Received:: 29

Best Answers:: 0

Trophy Points:: 0

#3

so basically if I have only this its fine?

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

sandrodz, Nov 18, 2007 IP

Sebastian Peon

Messages:: 11

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#4

I'd allow the category pages, but disallow the archives:
Disallow: /2005/
Disallow: /2006/
Disallow: /2007/
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/
Category pages can attract nice long tail search queries, that's not likely to happen with monthly archives listing posts of all categories.

Sebastian, Nov 19, 2007 IP

SeoSmarty Banned

Messages:: 13

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

Why use robots.txt for that? From my experience, Noindex is much better to avoid blog dup content problem... and it's recommended by googlers by the way. A really good post on it commented by Matt Cutts (Sphin):

sphinn.com/story/8667

(look at the comment #6)

SeoSmarty, Nov 19, 2007 IP

Log in or Sign up

robots.txt and duplicate content of wordpress

sandrodz Peon

Chewyshoe Peon

sandrodz Peon

Sebastian Peon

SeoSmarty Banned

Useful Searches