Duplicate Content?

Carly Peon

Messages:: 30

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#1

Firstly hello, this is my first post here!

Secondly, I'd like to ask your opinion about something I am working on...

A site I am trying to fix up to improve SEO has many pages on the server in a .html extension that are no longer in use, but have the exact same content as .php files in the same place. (I hope that makes sense!)

So, basically each document is in two forms, .php and .html only the .php documents are being used now, and are linked in the navigation/sitemaps.

Do you think that having both sets on the server will count as duplicate content, even though only one set is physically in use...

Should I delete the .html files? (I know I technically should, but there are hundreds and hundreds of files, and I've got a lot to do in a short amount of time. If it's not really going to help, I can save that job for in a few weeks time.)

Thanks!

Carly, Feb 27, 2008 IP

Carly Peon

Messages:: 30

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#2

Can anybody help me.

Further to the .php / .html duplicate pages, I've found today that there are files on the server named, for example lecture_one.php and lecture one.php (one with space, one with underscore) as far as I can tell, only one file is linked, though sometimes it's a space, sometimes it's an underscore...

So will all these 'orphan' pages count as duplicate content on the server?

Carly, Feb 27, 2008 IP

Lemurc Peon

Messages:: 64

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#3

Not the extensions of files create duplicate content but the links, title and description of pages and its own content.
Change the links.
lecture_one.php to new_lecture_one.php or whatever..

Lemurc, Feb 27, 2008 IP

Carly Peon

Messages:: 30

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#4

Sorry, I'm a little slow at times:

all pages are exact copies...
some are page1.html > page1.php (but only 1 set in use others just exist on server)
some are page_1.html > page 1.html (one set in use, but others exist on server)

(Don't ask me why, I had nothing to do with this site, I'm just working on the SEO!)

So, having this page, replicated exactly on the server, even though it's not linked from anywhere, can google crawl this and see it's duplicate content?

Carly, Feb 27, 2008 IP

~DaRk-EyE~ Banned

Messages:: 59

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

If you are using either .php or .html only like what you've said, I think there is nothing wrong with that.... As long as no same content are linked, there's is nothing you have to worry about... Content duplication are not only based on url but on what you have on your content itself...

~DaRk-EyE~, Feb 27, 2008 IP

Carly Peon

Messages:: 30

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#6

~DaRk-EyE~ said: ↑

If you are using either .php or .html only like what you've said, I think there is nothing wrong with that.... As long as no same content are linked, there's is nothing you have to worry about... Content duplication are not only based on url but on what you have on your content itself...
Click to expand...

Thank you

Carly, Feb 27, 2008 IP

Dan Schulz Peon

Messages:: 6,032

Likes Received:: 437

Best Answers:: 0

Trophy Points:: 0

#7

Actually you're going to want to either delete the HTML files and forward the links on them to the PHP version with a series of 301 redirects. Another option would be to resave the PHP files as .html files and have Apache parse the HTML files as if they were PHP files.

Dan Schulz, Feb 27, 2008 IP

wantmomoney Peon

Messages:: 239

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#8

Carly said: ↑

Sorry, I'm a little slow at times:

all pages are exact copies...
some are page1.html > page1.php (but only 1 set in use others just exist on server)
some are page_1.html > page 1.html (one set in use, but others exist on server)

(Don't ask me why, I had nothing to do with this site, I'm just working on the SEO!)

So, having this page, replicated exactly on the server, even though it's not linked from anywhere, can google crawl this and see it's duplicate content?
Click to expand...

OK Carly, since no one seems to be answering the question directly, I will. I am a web programmer and I can tell you that if there are NO links to those html files, that Google will not be able to crawl them. And if anybody says otherwise, they don't know what they are talking about. The search engine crawlers follow links. That's how they crawl. They DO NOT search directories in an attempt to create a link to any and all files that exist in the directory.

So do answer your duplicate content question, you have nothing to worry about. If there are no links to those files, Google will never crawl or list them.

wantmomoney, Feb 27, 2008 IP

Carly Peon

Messages:: 30

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#9

Thank you, that is as I thought. I presumed this would be the case, just needed to check. I was worried incase the new version (e.g. essay1.php) was exactly the same as an older, cached or previously indexed version (e.g. essay 1.html)

RE: the previous message, I already added 301 redirects to the .htaccess.

Carly, Feb 28, 2008 IP

ericajoieake Guest

Messages:: 556

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#10

duplicate content is strictly prohibited by search engines your site could be penalize and the worst could be ban!

ericajoieake, Feb 28, 2008 IP

metasearch Peon

Messages:: 15

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#11

Most important on duplicate content is to have big enough part of page unique from search engine perespecitve. I would say you should have more than 60% page different than other page.

metasearch, Feb 28, 2008 IP

astup1didiot Notable Member

Messages:: 5,926

Likes Received:: 270

Best Answers:: 0

Trophy Points:: 280

#12

metasearch said: ↑

Most important on duplicate content is to have big enough part of page unique from search engine perespecitve. I would say you should have more than 60% page different than other page.
Click to expand...

Percentage has nothing to do with it, in your theory I could take 10,000 wikipedia entries, change the order and wording around (equaling 60% change) and I'd be fine, sorry it doesn't work that way.

astup1didiot, Feb 28, 2008 IP

SEOguy101 Active Member

Messages:: 640

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 60

#13

metasearch said: ↑

Most important on duplicate content is to have big enough part of page unique from search engine perespecitve. I would say you should have more than 60% page different than other page.
Click to expand...

I agree with this quote. Part of the reason many websites sub pages dont rank high for longer tail keywords is because the content from there template has a higher word count than the content on any one page. In other words, google is flagging similar pages in your website with a duplicate content filter because not a high enough % of the page has unique content. One thing you can do is copy and paste all the words from the header, sidebar, and footer on your website into microsoft word, which will than automatically give you the word count for your template. I would aim to exceed this word count on every page of your site that you want ranked with unique content, OR reduce the word count on your template by converting parts from text to images or removing unnecessary links so that % of unique content is higher. Hopefully, this makes sense the way I explained it

SEOguy101, Feb 28, 2008 IP

chris265 Peon

Messages:: 702

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#14

Just use one or the other. If one set is not listed on any part of site your fine

chris265, Feb 28, 2008 IP

Carly Peon

Messages:: 30

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#15

^ Thank you Chris. It's amazing that out of 12 (approx) replies only 2 or 3 actually answered the question.

ericajoieake said:

duplicate content is strictly prohibited by search engines your site could be penalize and the worst could be ban!
Click to expand...

Yeah. Thanks. Erm.. that's the actual reason I posted the thread! I wanted to know if having two sets of the same pages on the server, with only one set in use could count against us SEO wise.

Maybe I don't make myself very clear!

Carly, Feb 29, 2008 IP

wd_2k6 Peon

Messages:: 1,740

Likes Received:: 54

Best Answers:: 0

Trophy Points:: 0

#16

ericajoieake said: ↑

duplicate content is strictly prohibited by search engines your site could be penalize and the worst could be ban!
Click to expand...

Yes, but you shouldn't worry about 1 or 2 duplicate only if you are mass submitting or reproducing same content.

wd_2k6, Feb 29, 2008 IP

trichnosis Prominent Member

Messages:: 13,785

Likes Received:: 333

Best Answers:: 0

Trophy Points:: 300

#17

Dan Schulz said: ↑

Actually you're going to want to either delete the HTML files and forward the links on them to the PHP version with a series of 301 redirects. Another option would be to resave the PHP files as .html files and have Apache parse the HTML files as if they were PHP files.
Click to expand...

this is wat i want to say

trichnosis, Feb 29, 2008 IP

Log in or Sign up

Duplicate Content?

Carly Peon

Carly Peon

Lemurc Peon

Carly Peon

~DaRk-EyE~ Banned

Carly Peon

Dan Schulz Peon

wantmomoney Peon

Carly Peon

ericajoieake Guest

metasearch Peon

astup1didiot Notable Member

SEOguy101 Active Member

chris265 Peon

Carly Peon

wd_2k6 Peon

trichnosis Prominent Member

Useful Searches