AzAkers
Apr 20th 2005, 11:27 am
Okay heres the deal -
I need to keep all HTML pages from being spidered. There are 700+ duplicate pages with only an agents name and number being different. They are in different folders so I can't just block the folders.
In particular I am concerned about how Google will follow this since it is Goole we are trying to keep from pinging us for duplicate content.
Can I just add this to the robots.txt ...
user-agent: *
Disallow: *.htm
Disallow: *.html
or should it be
user-agent: *
Disallow: *.htm$
Disallow: *.html$
or
user-agent: *
Disallow: /*.htm
Disallow: /*.html
Will that work..?
I need to keep all HTML pages from being spidered. There are 700+ duplicate pages with only an agents name and number being different. They are in different folders so I can't just block the folders.
In particular I am concerned about how Google will follow this since it is Goole we are trying to keep from pinging us for duplicate content.
Can I just add this to the robots.txt ...
user-agent: *
Disallow: *.htm
Disallow: *.html
or should it be
user-agent: *
Disallow: *.htm$
Disallow: *.html$
or
user-agent: *
Disallow: /*.htm
Disallow: /*.html
Will that work..?