Alright. I am going for speed here. When a program I am writing will run, it has the potential to insert/update a couple of hundred records at a time (it is being batch processed). What is the best way to go about doing this? Each record that is going to be inserted could already have a duplicate. Query DB for record and check and see if it exists - if it doesn't exist, insert it - if it does exist, update it with new values - free results for the next update and so we dont overload the memory So, this is 2 queries for each row (remember, at 100's range...The query is now doubled, so for every 100 rows, I actually have 2 queries). I was thinking of using the UNIQUE KEY CONSTRAINT on the column so that I could insert first, then if their is an error update it. two queries still. Is this one of those situations where the server will need to take a hit for a few moments? Another idea I came up with is to preload my script with the current values I have in the database. Now I will be comparing results against this array of values, rather then the actual database. However, this array has the potential to grow to the size to the 100's of thousands. So, if anyone has any suggestions on this, it would be greatly appreciated!
Maybe even ORACLE While you think it may not be very good; relevancy is everything. And the truth to that holds water far more then you think!
Right up front: I'm far from being a DB expert. Having said that, I'd say it's just one of those situations where you have to suck it and go with the two queries. Does your initial data (the 'source' data that will be batched) have any unique information? If you can find a unique key in the initial data that can be 'replicated' in the database that will speed up the initial query in the first place. Having said all that, at http://techrepublic.com.com/5208-6230-0.html?forumID=88&threadID=188827&messageID=1939661 there is mention of combining an UPDATE with a RIGHT JOIN but from my limited knowledge, it may be better to do an INNER JOIN. Food for thought but the way I see it, it's not really the size of the database, but the size of the data you will be batch processing in. A couple of hundred queries should not be a real issue, even if you're doing them every few seconds...
do you really worry about "a few hundred" queries? even my medicore notebook with only 512MB does about 10k inserts/minute if i backup the database from my webserver here you could use INSERT IGNORE (insert only if it dosnt exists) http://dev.mysql.com/doc/refman/5.0/en/insert.html , or