Getting rid of Duplicate Content

Discussion in 'Search Engine Optimization' started by ridun, Jan 18, 2008.

  1. #1
    I have 1000s of articles on my site, some of them have duplicate content disguised by the sender using different file names and making slight changes. Is there a free program I can use to find them?
     
    ridun, Jan 18, 2008 IP
  2. SteveNO

    SteveNO Peon

    Messages:
    101
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Not that I know of. Anyone else know? I may be in to this too.
     
    SteveNO, Jan 18, 2008 IP
  3. ericajoieake

    ericajoieake Guest

    Messages:
    556
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #3
    you can use copyscape.com and it is for free!
     
    ericajoieake, Jan 18, 2008 IP
  4. Vic_mackey

    Vic_mackey Banned

    Messages:
    2,093
    Likes Received:
    151
    Best Answers:
    0
    Trophy Points:
    0
    #4
    You can only do so many searches on a domain using copyscape before they will stop you using it for free.
     
    Vic_mackey, Jan 18, 2008 IP
  5. ridun

    ridun Peon

    Messages:
    87
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Thanks, but copyscape only finds duplicates of individual pages online. I would like to be able to find duplicates within my own site. I have been using a program called " Duplicate File Finder", which is very good but as far as I can see can only find files that are exactly the same. I need to find files that are very close in content.
     
    ridun, Jan 19, 2008 IP
  6. DJ5A

    DJ5A Peon

    Messages:
    13
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    DJ5A, Jan 19, 2008 IP
  7. SKE11

    SKE11 Peon

    Messages:
    214
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #7
    As long as the duplicate content is kept to a minimum it will not be a problem, I would leave it on and just be more carefull checking new content in the future.

    I would allways avoid removing pages from a site from an SEO view point.
     
    SKE11, Jan 20, 2008 IP
  8. pavel_kbc

    pavel_kbc Peon

    Messages:
    167
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #8
    created a php scripts for you...

    
    <?php
    $dbhost          = "localhost";
    $dbuser          = "dbusername";
    $dbpasswd        = "dbpassword";
    $dbname          = "dbname";
    $table    = "tablename";
    
    
    
    
    //// column name you use as your autoincrimented entry ID
    //// typically this is the 1st coulnm. If it isn't then change the
    //// $myrow[0] below accordingly.
    $id = "article_id";
    
    $field1 = "article title coulnm name";
    $field2 = "article body coulnm name";
    $field3 = "";
    $field4 = "";
    $field5 = "";
    $delay = "2"; // Delay to help prevent prevent excessive loads
    /////////////// end config 
    $connection = mysql_connect ($dbhost, $dbuser, $dbpasswd) or die ("Unable to connect");
    mysql_select_db ($dbname) or die ("Unable to select database $dbname");
    // build query
    $query = "SELECT * from {$table} group by $field1"; 
    if (!empty($field2)){$query .= ", $field2";}
    if (!empty($field3)){$query .= ", $field3";}
    if (!empty($field4)){$query .= ", $field4";}
    if (!empty($field5)){$query .= ", $field5";}
    $query .= " having count(*) > 1";
    // locate the dupes
    $result=mysql_query("$query");
    $dupes = mysql_num_rows($result);
    echo "Found $dupes duplicate entries<br>";
    // remove the dupes
    while ($myrow = mysql_fetch_row($result)) {
    $delete=mysql_query("DELETE from {$table} WHERE $id = $myrow[0]");
    echo "entry ". $myrow[0] ." removed <br>\n";
    sleep ($delay);
    }
    // re-optimize table if more than 1 dupe was removed
    if ($dupes > 2){ 
    echo "<br>\noptimizing {$table}";
    $optimize = mysql_query("OPTIMIZE TABLE `{$table}`");
    }
    if ($myrow[0] ==""){
    echo "Check Again Later.<br></b>&nbsp;&nbsp;&nbsp;&nbsp;<a href='http://www.article-bd.com'>Article-BD</a>";
    }else{
    ?>
    <html>
    <head>
    <title>Delete Duplicates</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    </head>
    <body>
    <b>All Done! Dupes Removed...<br><br></b>&nbsp;&nbsp;&nbsp;&nbsp;<a href='http://www.article-bd.com'>Article-BD</a>
    </body>
    </html>
    <?php
    }
    ?>
    
    PHP:
    visit for support
     
    pavel_kbc, Jan 20, 2008 IP
  9. Gatorade

    Gatorade Peon

    Messages:
    2,130
    Likes Received:
    222
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Wow, have you used the script pavel created for you?
     
    Gatorade, Jan 21, 2008 IP