Hi All, We have recently purchased an XML feed of news articles from a news organisation. I'm trying to write a script to copy this feed into a mySQL database. The problem I have is that their XML feed updates in a "queue" style system where as a new article is added, the last article is removed. I'm trying to write an insert statement like this: INSERT INTO articles ( OriginalID , SourceID , Created , Headline , Article , ArticleSummary , PictureURL , PictureURLSmall ) VALUES ( foo , bar ... ); Code (markup): ...that will avoid adding duplicate stories to my database. The field that I can used to identify duplicates is 'OriginalID'. Ideally I would like to do this in a single statement - in fact, i'm sure i've done it before but I can't find the code! My primary key is an auto increment on the database, rather than using their article id, which is why I can't rely on mySQL to handle this - this is a precaution in case I add additional news sources at a later stage. Maybe something along the lines of 'IF NOT EXISTS'? Any suggestions? Cheers! - Jamie
create a unique key on the field, or do a select first to see if it exists inside a transaction that is followed by the insert if needed, if you are using something that support it (not myisam). Another option is to make the primary key a composite key that includes both your unique id and the data sources.
This is my ideal solution, however I am having trouble composing a conditional statement for mySQL. Any clues as to the syntax? Cheers, Jamie
actually I think the key ideas are more the idea solution really. but to do it htis way you would do mysql_query( 'begin work; select count(*) from blah where blah'); now back in your code you check the value of count if ( count < 1) mysql_query('insert blah blah'); mysql_query('commit;'); btw, postgreql is a better db for doing things like this.
Hi jbw, Thanks for your help through the day. In the end I went, as you suggested, for the unique key option. Just not sure if I like having to deliberately create "exceptions" - but it seems to be the most efficient method. Just to clarify, I am actually working with mySQL and VBScript - weird combo I know!
You can go with an additional lookup, as jbw suggests, or you can just look up some data from the last feed item you processed (e.g. look up description and URL with the highest autoincremented record ID) and then skip all new feed entries until you get the item you found in your database. Then you can process all feed entries following this one. J.D.
Hi all, Bit of a follow up to this query, which is why I have included it in the same thread. I have now discovered that our supplier makes modifications to existing articles in their XML feed. This means that I need to update rows that have new content. My current insert statement is: INSERT INTO articles ( OriginalID , SourceID , Created , Headline , Article , ArticleSummary , PictureURL , PictureURLSmall ) VALUES( " & ItemID & " , 1 , '" & Created & "' , '" & EscapeCharacters( Headline ) & "' , '" & EscapeCharacters( Article ) & "' , '" & EscapeCharacters( ArticleSummary ) & "' , '" & EscapeCharacters( PictureURL ) & "' , '" & EscapeCharacters( PictureURLSmall ) & "' )" Code (markup): The table also includes an autonumbered ArticleID. I have set OriginalID to be unique, so when an existing article is encountered currently the INSERT does not take place. I now want to UPDATE existing records with the content in the XML feed (which may or may not have changed). I have been investigating the ON DUPLICATE KEY UPDATE feature, but the documentation is slightly vague. Does anyone have any suggestions as to how I can achieve this? Thanks in advance, Jamie
If the feed you are getting contains anything unique such as an ID or title (which shouldn't change) then use this concept to save time Setup the table with a unique key (ID or TITLE) // this will only insert unique result - using the insert failure as a filter RESULT = QUERY: insert this row into the datbase including ID/TITLE // if you want to update the table do this - using the failure above as an // indicator that the record exists if the RESULT fails UPDATE this row based on ID Once you get this running properly, just suppress any error messages and away you go. Hope that makes sense - I wanted to put a non-code example.