How do you use PHP with large XML files?

ForumJoiner Active Member

Messages:: 762

Likes Received:: 32

Best Answers:: 0

Trophy Points:: 83

#1

All examples I've seen so far use the PHP's XML parser. The basic idea is to load your xml file in memory, make a huge array from it then give it to the XML parser.

$data = implode("", file($filename)); means that you'll have to read the entire file in the memory. You have to change the php.ini settings, in order to get the extra memory.

For files larger than 16 MiB, processing takes a long time. You also have to change the php.ini file to increase the allowed execution time.

Is there any better solution? For instance, an XML parser that will not load the entire file in the memory?

ForumJoiner, Aug 24, 2008 IP

rcadble Peon

Messages:: 109

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#2

I can't think of a better way to do it, but you don't have to edit the .ini file. You can add the
set_time_limit(0);
Code (markup):
code to your file and it will never time out.

rcadble, Aug 24, 2008 IP

ForumJoiner Active Member

Messages:: 762

Likes Received:: 32

Best Answers:: 0

Trophy Points:: 83

#3

You are right. It will not time out, but it may use too much resources on my shared hosting account.

Another way to solve the problem is to write another XML parser (I already wrote one) that does not read the entire file into the memory. I was curios to know if there is another way, or maybe another better XML parser.

I just wrote mine, but I always think "What if someone else did better?"

ForumJoiner, Aug 24, 2008 IP

cornetofreak Peon

Messages:: 170

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#4

have u tried using curl and preg_match() ????

cornetofreak, Aug 24, 2008 IP

ForumJoiner Active Member

Messages:: 762

Likes Received:: 32

Best Answers:: 0

Trophy Points:: 83

#5

cornetofreak said: ↑

have u tried using curl and preg_match() ????
Click to expand...

I would like some details about this method, please.

ForumJoiner, Aug 25, 2008 IP

nico_swd Prominent Member

Messages:: 4,153

Likes Received:: 344

Best Answers:: 18

Trophy Points:: 375

#6

cURL will load the whole file in the memory too. The only way of avoiding that is using fopen(), fgets(), and fclose(). The only problem I see with this is, what if a single tag is in multiple lines. Eg:
<tag>some data here

Some more data</tag>
Code (xml):
... the only way of parsing these would be by loading the whole file in the memory. Otherwise it might stop in the middle of the tag and it wouldn't parse correctly.

Also:
$data = implode("", file($filename));
PHP:
... this method was used a long time ago (when file_get_contents() didn't exist yet). Nowadays I suggest not doing this. Especially for 16 MB files. file() reads the whole file into the memory, and then explodes it by new lines. How many lines could a 16MB file possibly have? A lot... and then implode() joins this giant array which is a lot of stress for the server too. Use just file_get_contents(), which does the same as both functions together, just a lot faster.

nico_swd, Aug 25, 2008 IP

ForumJoiner Active Member

Messages:: 762

Likes Received:: 32

Best Answers:: 0

Trophy Points:: 83

#7

nico_swd said: ↑

cURL will load the whole file in the memory too. The only way of avoiding that is using fopen(), fgets(), and fclose(). The only problem I see with this is, what if a single tag is in multiple lines. Eg:
xml Code:

<tag>some data here

Some more data</tag>

Click to expand...

In my case, I'm lucky. I know, for a fact, that the XML file that I'll parse has all the ending tags on the same row as the beginning tags.

A general solution could be
- step1 : replace all the <cr> with something unique (like : lkj2o793345l3)
- step2 : read 8k at a time, or whatever will make sure you'll cover an entire <tag>...</tag>
- step3 : replace back lkj2o793345l3 with <cr>, where necessary.

I believe that fread stops when encountering an <cr>, therefore the method above tries to prevent that.

What do you think?

ForumJoiner, Aug 25, 2008 IP

nico_swd Prominent Member

Messages:: 4,153

Likes Received:: 344

Best Answers:: 18

Trophy Points:: 375

#8

You can use fgets(), it'll read one line form the given resource. Then you could use a regular expression to grab whatever you need.

Something like:
if (preg_match('~<[^>]+>(.*?)</\1>~s', $line, $match))
{
    echo $match[1];
}
PHP:

nico_swd, Aug 25, 2008 IP

netproint Peon

Messages:: 334

Likes Received:: 9

Best Answers:: 0

Trophy Points:: 0

#9

ForumJoiner said: ↑

In my case, I'm lucky. I know, for a fact, that the XML file that I'll parse
Click to expand...

Sorry to bump into your thread ForumJoiner, but I have been trying to understand "Parsing XML to PHP". Can you provide some help on this, which tools you use to parse XML to php? Please, I would appreciate your help a LOT. I'm in real need to know the tools you guys use to parse XML to PHP.
Topics I'm trying to learn are Parsing XML to PHP, XSLT, Pear, SOAP (this is for Amazon Web Services, trying to learn which tools are best for the job).
Please reply here or PM me.
Any help would be greatly appreciated.
Regards,
Andy.

netproint, Aug 25, 2008 IP

ForumJoiner Active Member

Messages:: 762

Likes Received:: 32

Best Answers:: 0

Trophy Points:: 83

#10

netproint said: ↑

Sorry to bump into your thread ForumJoiner, but I have been trying to understand "Parsing XML to PHP". Can you provide some help on this, which tools you use to parse XML to php?
Click to expand...

Actually, is parsing XML using PHP. More details here:
http://www.php.net/manual/en/ref.xml.php
You can use the help above, which will give you some functions and examples.
This thread is about optimizing. PHP's XML parser described above loads the entire file into memory. I wanted a faster and less consuming choice. I'm still seeking it

ForumJoiner, Aug 25, 2008 IP

Log in or Sign up

How do you use PHP with large XML files?

ForumJoiner Active Member

rcadble Peon

ForumJoiner Active Member

cornetofreak Peon

ForumJoiner Active Member

nico_swd Prominent Member

ForumJoiner Active Member

nico_swd Prominent Member

netproint Peon

ForumJoiner Active Member

Useful Searches