How do you use PHP with large XML files?

Discussion in 'PHP' started by ForumJoiner, Aug 24, 2008.

  1. #1
    All examples I've seen so far use the PHP's XML parser. The basic idea is to load your xml file in memory, make a huge array from it then give it to the XML parser.

    $data = implode("", file($filename)); means that you'll have to read the entire file in the memory. You have to change the php.ini settings, in order to get the extra memory.

    For files larger than 16 MiB, processing takes a long time. You also have to change the php.ini file to increase the allowed execution time.

    Is there any better solution? For instance, an XML parser that will not load the entire file in the memory?
     
    ForumJoiner, Aug 24, 2008 IP
  2. rcadble

    rcadble Peon

    Messages:
    109
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I can't think of a better way to do it, but you don't have to edit the .ini file. You can add the
    set_time_limit(0);
    Code (markup):
    code to your file and it will never time out.
     
    rcadble, Aug 24, 2008 IP
  3. ForumJoiner

    ForumJoiner Active Member

    Messages:
    762
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    83
    #3
    You are right. It will not time out, but it may use too much resources on my shared hosting account.

    Another way to solve the problem is to write another XML parser (I already wrote one) that does not read the entire file into the memory. I was curios to know if there is another way, or maybe another better XML parser.

    I just wrote mine, but I always think "What if someone else did better?" :)
     
    ForumJoiner, Aug 24, 2008 IP
  4. cornetofreak

    cornetofreak Peon

    Messages:
    170
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #4
    have u tried using curl and preg_match() ????
     
    cornetofreak, Aug 24, 2008 IP
  5. ForumJoiner

    ForumJoiner Active Member

    Messages:
    762
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    83
    #5
    I would like some details about this method, please.
     
    ForumJoiner, Aug 25, 2008 IP
  6. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #6
    cURL will load the whole file in the memory too. The only way of avoiding that is using fopen(), fgets(), and fclose(). The only problem I see with this is, what if a single tag is in multiple lines. Eg:
    
    <tag>some data here
    
    Some more data</tag>
    
    Code (xml):
    ... the only way of parsing these would be by loading the whole file in the memory. Otherwise it might stop in the middle of the tag and it wouldn't parse correctly.

    Also:
    
    $data = implode("", file($filename));
    
    PHP:
    ... this method was used a long time ago (when file_get_contents() didn't exist yet). Nowadays I suggest not doing this. Especially for 16 MB files. file() reads the whole file into the memory, and then explodes it by new lines. How many lines could a 16MB file possibly have? A lot... and then implode() joins this giant array which is a lot of stress for the server too. Use just file_get_contents(), which does the same as both functions together, just a lot faster.
     
    nico_swd, Aug 25, 2008 IP
  7. ForumJoiner

    ForumJoiner Active Member

    Messages:
    762
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    83
    #7
    In my case, I'm lucky. I know, for a fact, that the XML file that I'll parse has all the ending tags on the same row as the beginning tags.

    A general solution could be
    - step1 : replace all the <cr> with something unique (like : lkj2o793345l3)
    - step2 : read 8k at a time, or whatever will make sure you'll cover an entire <tag>...</tag>
    - step3 : replace back lkj2o793345l3 with <cr>, where necessary.

    I believe that fread stops when encountering an <cr>, therefore the method above tries to prevent that.

    What do you think?
     
    ForumJoiner, Aug 25, 2008 IP
  8. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #8
    You can use fgets(), it'll read one line form the given resource. Then you could use a regular expression to grab whatever you need.

    Something like:
    
    if (preg_match('~<[^>]+>(.*?)</\1>~s', $line, $match))
    {
        echo $match[1];
    }
    
    PHP:
     
    nico_swd, Aug 25, 2008 IP
  9. netproint

    netproint Peon

    Messages:
    334
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Sorry to bump into your thread ForumJoiner, but I have been trying to understand "Parsing XML to PHP". Can you provide some help on this, which tools you use to parse XML to php? Please, I would appreciate your help a LOT. I'm in real need to know the tools you guys use to parse XML to PHP.
    Topics I'm trying to learn are Parsing XML to PHP, XSLT, Pear, SOAP (this is for Amazon Web Services, trying to learn which tools are best for the job).
    Please reply here or PM me.
    Any help would be greatly appreciated.
    Regards,
    Andy.
     
    netproint, Aug 25, 2008 IP
  10. ForumJoiner

    ForumJoiner Active Member

    Messages:
    762
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    83
    #10
    Actually, is parsing XML using PHP. More details here:
    http://www.php.net/manual/en/ref.xml.php
    You can use the help above, which will give you some functions and examples.
    This thread is about optimizing. PHP's XML parser described above loads the entire file into memory. I wanted a faster and less consuming choice. I'm still seeking it :)
     
    ForumJoiner, Aug 25, 2008 IP