Need help XML 40GB Import

Discussion in 'PHP' started by travisdh, Jul 3, 2008.

  1. #1
    Hello.

    I was after anyones advice, i have heaps of XML files i would like to import into a MySQL database, however no matter what i try i always get errors due to the sheer size of the files....

    The largest file is about 40GB, and they go down frmo there, the server i want to use only has 2gb of ram so errors always occur, but does anyone have any experience or suggestions on importing these size files, i was going to use PHP and a parse and sql import flow through a php script, has anyone done this???

    Please help, it would be most appreciated....
     
    travisdh, Jul 3, 2008 IP
  2. catapop

    catapop Peon

    Messages:
    79
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    catapop, Jul 3, 2008 IP
  3. BuyMyScripts

    BuyMyScripts Peon

    Messages:
    32
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    If you can make some assumptions about the XML files, you may be able to code a low memory version, using PHP 4 XML (xml_parse, xml_set_element_handler etc), and doing freads.

    I did this for some large XML files my scripts needed to install.

    Basic strategy is:

    In your start and end elements, keep track of the tag your up to: I used ^ between elements, so if (for example), you were processing html, you might end up with HTML^PAGE^P^H1

    In the end elements, and character data, either store data in a variable for later use, or do your sql query.

    After setting up your parser, use a loop like this:

        while ($data = fread($fp, 4096))
        {
            if (!xml_parse($xml_parser, $data, feof($fp)))
            {
                die(sprintf("XML error: %s at line %d",
                            xml_error_string(xml_get_error_code($xml_parser)),
                            xml_get_current_line_number($xml_parser)));
            }
        }
    Code (markup):
    It's not as easy, as quick, or as flexible as simplexml, but it gets the job done...
     
    BuyMyScripts, Jul 4, 2008 IP