Counting duplicates on bulk INSERT IGNORE / INSERT ... ON DUPLICATE KEY UPDATE

meloncreative Peon

Messages:: 37

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

So I have a script that imports CSV's for contact lists into a table ( http://forums.digitalpoint.com/showthread.php?t=2012367 ) and to make it as efficient as possible it detects duplicates purely in SQL rather than relying on PHP and bloating the code.

The script can be set to either overwrite or ignore any duplicates, and I need to be able to count how many duplicates it found. Depending on which option is set for how to handle duplicates, the query looks like the follow. Note: These are bulk inserts where multiple rows are inserted in a single query. This example shows 5 CSV rows in the batch query, but this number is much higher on the live script
INSERT IGNORE INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk');
Code (markup):
INSERT INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk') ON DUPLICATE KEY UPDATE c_name = VALUES(c_name);
Code (markup):
My problem is that I need somehow to be able to detect accurately how many duplicates there were mixed in with rows that weren't. Some people have suggested mysql_errno and mysql_affected_rows but these dont seem to work for the way I have implemented things

Any ideas?

meloncreative, Dec 2, 2010 IP

koko5 Active Member

Messages:: 394

Likes Received:: 14

Best Answers:: 1

Trophy Points:: 70

#2

You can create a temporary table with no unique keys in it, than insert csv (all records) into it. Now you can count duplicates by joining temporary and first table, than you can insert records gathering data from temporary table instead of csv.

Regards

koko5, Dec 2, 2010 IP

sukosari Active Member

Messages:: 63

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 93

#3

just copied to spreedsheet like ms.excel and sorted

sukosari, Dec 3, 2010 IP

Dwaighty Peon

Messages:: 358

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#4

Add a field `counter` to the table and use the on duplicate key update clause as follows:

INSERT INTO contacts (site_id, c_registered, c_domain, c_subscribed, c_name, c_email) VALUES ('10', NOW(), 'domain.co.uk', '1', 'Chris', 'chris@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Cyn', 'cyn@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Chloe', 'chloe@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Jennie', 'jennie@domain.co.uk'), ('10', NOW(), 'domain.co.uk', '1', 'Steve', 'steve@domain.co.uk') ON DUPLICATE KEY UPDATE c_name = VALUES(c_name), `counter` = `counter` + 1;

Dwaighty, Dec 6, 2010 IP

Log in or Sign up

Counting duplicates on bulk INSERT IGNORE / INSERT ... ON DUPLICATE KEY UPDATE

meloncreative Peon

koko5 Active Member

sukosari Active Member

Dwaighty Peon

Useful Searches