Counting duplicates on bulk INSERT IGNORE / INSERT ... ON DUPLICATE KEY UPDATE

Discussion in 'MySQL' started by meloncreative, Dec 2, 2010.

  1. #1
    So I have a script that imports CSV's for contact lists into a table ( http://forums.digitalpoint.com/showthread.php?t=2012367 ) and to make it as efficient as possible it detects duplicates purely in SQL rather than relying on PHP and bloating the code.

    The script can be set to either overwrite or ignore any duplicates, and I need to be able to count how many duplicates it found. Depending on which option is set for how to handle duplicates, the query looks like the follow. Note: These are bulk inserts where multiple rows are inserted in a single query. This example shows 5 CSV rows in the batch query, but this number is much higher on the live script

    INSERT IGNORE INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk');
    Code (markup):
    INSERT INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk') ON DUPLICATE KEY UPDATE c_name = VALUES(c_name);
    Code (markup):
    My problem is that I need somehow to be able to detect accurately how many duplicates there were mixed in with rows that weren't. Some people have suggested mysql_errno and mysql_affected_rows but these dont seem to work for the way I have implemented things

    Any ideas?
     
    meloncreative, Dec 2, 2010 IP
  2. koko5

    koko5 Active Member

    Messages:
    394
    Likes Received:
    14
    Best Answers:
    1
    Trophy Points:
    70
    #2
    You can create a temporary table with no unique keys in it, than insert csv (all records) into it. Now you can count duplicates by joining temporary and first table, than you can insert records gathering data from temporary table instead of csv.

    Regards :)
     
    koko5, Dec 2, 2010 IP
  3. sukosari

    sukosari Active Member

    Messages:
    63
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    93
    #3
    just copied to spreedsheet like ms.excel and sorted
     
    sukosari, Dec 3, 2010 IP
  4. Dwaighty

    Dwaighty Peon

    Messages:
    358
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Add a field `counter` to the table and use the on duplicate key update clause as follows:

    INSERT INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk') ON DUPLICATE KEY UPDATE c_name = VALUES(c_name), `counter` = `counter` + 1;
     
    Dwaighty, Dec 6, 2010 IP