One of my sites has a few word documents uploaded to the server. When I use the site: command in the Google Query, then it shows the files and also shows the content of the files. Now, if I make a site with html and put the contents of the files, will the new site be penalized for duplicate content? Thanks for you help.
Yes - there will be a sort of penalty (rather a filter): google will likely only index the html file. This is not really a problem though. As long as one or the other gets indexed and ranks, who cares?
No..what I meant was, the ".doc" file is already indexed and listed. Now if I just copy paste, and create a ".html" document, will the "html" document be penalized or regarded as duplicate content or something? thanks for answering.
Are you completely sure about this?? Because when I use this command "content of the doc file blah blah blah" in Google, there are NO results..
Well, then your .doc page is NOT listed in google - even if it IS listed on your site. Google will prefer .html pages to .doc pages any time. So make the .html file and link it to the .doc file for people who want to print it out. Duplicate content isn't this much of an issue. As long as one or the other gets listed, you should be happy. Where people worry about duplicate content for a reason - they have multiple HTML files with the same content. That dilutes google pagerank and therefor means the site as a whole risks getting less visitors. When it's just one page, and the pages are in different fileformats, I don't think you have anything to worry about at all. Worry about getting enough links to your site, and quality internal navigation, so that all your HTML pages get listed in google in the first place.
I had the same issue. My all the articles were in PDF format as well. My PDF files are all in supplemental, Although my html pages was indexed earlier than PDF. I hope this is sort of same situation. Be careful before doing anything. Regards.
Are you thinking of making a new site for the HTML version? Maybe what you can do is to use the same site for both HTML and DOC formats. You can always add a link to the DOC or PDF file from the HTML page, with a link 'Download the DOC version here' or 'Download the PDF version here'. Making a completely new site wont be a good idea and can lead to duplicate content penalties.