Hi, I'm currently running a site that depends largely on XML files, it also requires a lot of mapping. In other words I have to match up the XML fields to the correct fields in my database. The problem I am having is that some of the XML files are fairly big. Is there any way of when loading an XML file, it only loads the first item, or that it ignores any of the data and just gets the structure of the XML file? I'm currently using SimpleXML to load the file, but open to other classes if they are better. Cheers
depending on what you need more: memory or CPU time, you can use either SimpleXML or DOM. Look at the article: http://blog.liip.ch/archive/2004/05/10/processing_large_xml_documents_with_php.html and the benchmark http://svn.bitflux.org/repos/public/php5examples/largexml/fulldocu.pdf
SimpleXML is not practical for big file since it loads files to memory before processing them (It's a tree-based parser). XMLreader uses another approach (It's a stream-based parser), so you should get the job done easily with XMLreader See http://www.ibm.com/developerworks/xml/library/x-xmlphp2.html ... many examples provided
Well there is one XML file I am loading which is 400 odd MB which I have no control over. All I want to do is to load the structure of it, but it takes ages. I have some automated scripts that load the file properly and uses all the data, but this is meant to be loaded in a browser so need it to be fairly quick. Trying to load a file that is 400 MB is crashing the browser. Any suggestions?
Normally, the structure is defined on the DTD !!! The service or site that provide the XML file should provide the DTD too... IMHO For public display of the data, I suggest the following (quite common): * Parse the file(s) on the server and divide it into separate DB rows (1 DB row per undividable item) * When a visitor requests data, the data will be retrieved easily (using DB indexes, optimization, etc...)
That was an option I considered in the past, but with SimpleXML loading the entire file it didn't seem very practicle. I've just been playing with XMLReader and it seems a lot faster because I can "break" it at any given point then I could potentially do this now. There is no DTD though as far as I am aware.