Opinion Regarding Robots.text Change.

warneylm Peon

Messages:: 10

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

Hi All,

We are taking a bit of a hit on Google due to the amount of duplicate content on our site. Therefore as part of a phased approach we want to start removing non-essential pages from the eyes of Googlebot.

Basically, our first action is to disallow Googlebot from crawling/indexing any product page that is a 4th generation copy (or more). We have done some initial research and think that a use of the * wildcard function as per below should be ok...

This is the proposed code we are thinking of using:

User-agent: *
Disallow: /copy_of_copy_of_copy_of_*.html

Therefore:

http://www.kjbeckett.com/acatalog/bl...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/fred-perry_p2.html).
http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/mens-bags_p4.html).
http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD be crawled (from http://www.kjbeckett.com/acatalog/mens-bags.html).
http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD NOT be crawled (from http://www.kjbeckett.com/acatalog/messenger-bags.html).
http://www.kjbeckett.com/acatalog/co...red-perry.html WOULD NOT be crawled (from http://www.kjbeckett.com/acatalog/fred-perry.html).

Do you think our usage of the * wildcard is correct? Therefore, using the examples above, would we still be crawled where we want to, and not crawled where we donâ€™t want to?

Any help would be greatly appreciated.

Cheers,
Liam

warneylm, Apr 10, 2012 IP

trosquin Active Member

Messages:: 681

Likes Received:: 9

Best Answers:: 0

Trophy Points:: 60

#2

rather than mess with robots.txt which can have a bad affect on the site....why not add canonical tags to those pages. Or even the noindex tag...that would be much easier.

trosquin, Apr 10, 2012 IP

knysna Peon

Messages:: 81

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#3

Hi warneylm. As mentioned above rather use canonical or . Blocking URL's via robots.txt is no guarantee that they won't reappear in the search results. If other sites have linked to those pages the bot will follow those links and index them again. Also remembering if you block via robots.txt and via the meta tags then the spider may never get to crawl the page to see the noindex meta tags, so the URL may still appear in the search results and come up as copy. So be careful how you block the spider. Regards, knysna.

knysna, Apr 11, 2012 IP

knysna Peon

Messages:: 81

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#4

Sorry the beginning of the above post never came out. It was meant to read. As mentioned above rather use canonical or the noindex,nofollow meta tag.

knysna, Apr 11, 2012 IP

Log in or Sign up

Opinion Regarding Robots.text Change.

warneylm Peon

trosquin Active Member

knysna Peon

knysna Peon

Useful Searches