Bit of background: I have a product database, and now an external product database with extra information/etc... I need to match THEIR product database to MY product database. The external database has roughly 1,200,000 while my internal one contains 500,000. The way I do it now, is do it in batches of 20,000 from 0 to the highest product # which is 1 billion. I'm thinking this may not be the best way because there are many batches which will not contain 20,000 records. But in this example, $low and $high would be 20,000 apart. SELECT external.productid, external.mfgpartno, mfr_xref.internal_mfr_id FROM external_product AS external INNER JOIN mfr_xref ON mfr_xref.external_mfr_id = external.manufacturerid LEFT JOIN prod_xref ON prod_xref.external_product_id = external.productid WHERE external.productid BETWEEN $low AND $high AND prod_xref.external_product_id IS NULL Code (markup): Any ideas for optimizing maybe not only the query, but the process too?
What columns do you have indexes on? Also, this "AND prod_xref.external_product_id IS NULL" doesn't make any sense to me based on the way it way used above in the left join. From the above query, you should have an index on external.productid You should also probably have foreign key restrains (which also use an index) on mfr_xref.external_mfr_id to external.manufacturerid, and prod_xref.external_product_id to external.productid.
I am normally used to using FK for "if the row is deleted in table X, delete all associated data in table Y". What is the point of a FK with no actions on update/delete? Also, the "AND prod_xref.external_product_id IS NULL" is selecting all the rows NOT matched. There is no matching row in the xref table, so it isn't matched... and a null value. There is an index on both the external and internal product ids in the xref table. Will have to double check the external table indexes as I didn't create it...