I have 1000s of articles on my site, some of them have duplicate content disguised by the sender using different file names and making slight changes. Is there a free program I can use to find them?
You can only do so many searches on a domain using copyscape before they will stop you using it for free.
Thanks, but copyscape only finds duplicates of individual pages online. I would like to be able to find duplicates within my own site. I have been using a program called " Duplicate File Finder", which is very good but as far as I can see can only find files that are exactly the same. I need to find files that are very close in content.
As long as the duplicate content is kept to a minimum it will not be a problem, I would leave it on and just be more carefull checking new content in the future. I would allways avoid removing pages from a site from an SEO view point.
created a php scripts for you... <?php $dbhost = "localhost"; $dbuser = "dbusername"; $dbpasswd = "dbpassword"; $dbname = "dbname"; $table = "tablename"; //// column name you use as your autoincrimented entry ID //// typically this is the 1st coulnm. If it isn't then change the //// $myrow[0] below accordingly. $id = "article_id"; $field1 = "article title coulnm name"; $field2 = "article body coulnm name"; $field3 = ""; $field4 = ""; $field5 = ""; $delay = "2"; // Delay to help prevent prevent excessive loads /////////////// end config $connection = mysql_connect ($dbhost, $dbuser, $dbpasswd) or die ("Unable to connect"); mysql_select_db ($dbname) or die ("Unable to select database $dbname"); // build query $query = "SELECT * from {$table} group by $field1"; if (!empty($field2)){$query .= ", $field2";} if (!empty($field3)){$query .= ", $field3";} if (!empty($field4)){$query .= ", $field4";} if (!empty($field5)){$query .= ", $field5";} $query .= " having count(*) > 1"; // locate the dupes $result=mysql_query("$query"); $dupes = mysql_num_rows($result); echo "Found $dupes duplicate entries<br>"; // remove the dupes while ($myrow = mysql_fetch_row($result)) { $delete=mysql_query("DELETE from {$table} WHERE $id = $myrow[0]"); echo "entry ". $myrow[0] ." removed <br>\n"; sleep ($delay); } // re-optimize table if more than 1 dupe was removed if ($dupes > 2){ echo "<br>\noptimizing {$table}"; $optimize = mysql_query("OPTIMIZE TABLE `{$table}`"); } if ($myrow[0] ==""){ echo "Check Again Later.<br></b> <a href='http://www.article-bd.com'>Article-BD</a>"; }else{ ?> <html> <head> <title>Delete Duplicates</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body> <b>All Done! Dupes Removed...<br><br></b> <a href='http://www.article-bd.com'>Article-BD</a> </body> </html> <?php } ?> PHP: visit for support