So I have a script that imports CSV's for contact lists into a table ( http://forums.digitalpoint.com/showthread.php?t=2012367 ) and to make it as efficient as possible it detects duplicates purely in SQL rather than relying on PHP and bloating the code. The script can be set to either overwrite or ignore any duplicates, and I need to be able to count how many duplicates it found. Depending on which option is set for how to handle duplicates, the query looks like the follow. Note: These are bulk inserts where multiple rows are inserted in a single query. This example shows 5 CSV rows in the batch query, but this number is much higher on the live script INSERT IGNORE INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk'); Code (markup): INSERT INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk') ON DUPLICATE KEY UPDATE c_name = VALUES(c_name); Code (markup): My problem is that I need somehow to be able to detect accurately how many duplicates there were mixed in with rows that weren't. Some people have suggested mysql_errno and mysql_affected_rows but these dont seem to work for the way I have implemented things Any ideas?
You can create a temporary table with no unique keys in it, than insert csv (all records) into it. Now you can count duplicates by joining temporary and first table, than you can insert records gathering data from temporary table instead of csv. Regards
Add a field `counter` to the table and use the on duplicate key update clause as follows: INSERT INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk') ON DUPLICATE KEY UPDATE c_name = VALUES(c_name), `counter` = `counter` + 1;