1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

MySQL Database Dump Takes Over 6 Full Days!

Discussion in 'PHP' started by expron, Mar 11, 2008.

  1. #1
    Hello,

    I am experimenting with the TLD Zone Files. These files are 5.9 GB each when extracted.

    It is nearly taking me 7 full days for this MySQL dump to finish. I need to minimize this into 6-8 hours. Here's all the details:

    Here is a sample data of the 5.9 GB file line by line:

    WESTERN-PHOTO NS NS2.OCO.NET.
    PILOT-TECH NS NS1.FATCOW
    PILOT-TECH NS NS2.FATCOW
    NETCASTER NS NS75.WORLDNIC
    NETCASTER NS NS76.WORLDNIC
    CAR-NECTION NS NS.CIHOST
    CAR-NECTION NS NS2.CIHOST
    INTEGRATEDCONCEPTS NS NS1.SECURE.NET.
    INTEGRATEDCONCEPTS NS NS2.SECURE.NET.
    PASQUALE NS NS-00.SHOUT.NET.
    PASQUALE NS NS-01.SHOUT.NET.
    SLCTNET NS ENTERPRISE.SELECT.NET.
    SLCTNET NS EXCELCIOR.SELECT.NET.
    SATSOFT NS NS.HIS
    SATSOFT NS NS2.HIS
    SATSOFT NS NS3.HIS
    SATSOFT NS NS4.HIS
    SPINNERET NS NS1.MAXEN.NET.
    SPINNERET NS NS.MAXEN.NET.
    MRDAN NS NS1.SECURE.NET.
    MRDAN NS NS2.SECURE.NET.
    MGREEN NS NS.NEO.NET.
    MGREEN NS NS2.NEO.NET.
    ALLIANT NS DBRU.BR.NS.ELS-GMS.ATT.NET.
    ALLIANT NS DMTU.MT.NS.ELS-GMS.ATT.NET.
    CHECKTRANS NS NS1.DARGAL
    CHECKTRANS NS NS1.DARGALSOLUTIONS
    FORMANCO NS NS1.LAMEDELEGATION.NET.
    FORMANCO NS NS2.LAMEDELEGATION.NET.
    ELECTRONIC-PUB NS NS1.PENNWELL
    ELECTRONIC-PUB NS NS2.PENNWELL
    ELECTRONIC-PUB NS NS3.PENNWELL
    MICROLITHO-WORLD NS WCSGATE.WILCOM
    MICROLITHO-WORLD NS NS2.WILCOM
    SUNHING NS NS51.WORLDNIC
    SUNHING NS NS52.WORLDNIC
    DYNAFLOW-INC NS NS3.BROADWING.NET.
    DYNAFLOW-INC NS NS4.BROADWING.NET.
    IMGN-IT NS NS5.DNSSERVER8
    IMGN-IT NS NS4.DNSSERVER8


    Here is the PHP code I made to import into the DB:

    <?php
    //These lines are user configurable, change them to your liking
    mysql_connect("localhost", "root", "") or die(mysql_error());
    mysql_select_db("zones") or die(mysql_error());
    // change the TLD to whatever we're going to read from this file
    $tld = ".COM";
    // Filename to read domains from.
    $handle = fopen("zones.txt", "r");

    // Leave everything below here alone :)

    if ($handle) {
    while (!feof($handle)) {
    $buffer = fgets($handle);
    if (preg_match("/^(.*) NS (.*)\.$/", $buffer, $matches)) {
    $domain = $matches[1] . $tld;
    $ns = $matches[2];
    } elseif (preg_match("/^(.*) NS (.*)$/", $buffer, $matches)) {
    $domain = $matches[1] . $tld;
    $ns = $matches[2] . $tld;
    }
    mysql_query("INSERT INTO domain VALUES('$domain', '$ns')") or print(mysql_error());
    }
    fclose($handle);
    }
    ?>

    But the above code is taking over 6 days to import into a MySQL DB. The 5.9GB text file & mySQL server XAMPP are on the same box & the MySQL & 5.9GB file are on ths same IDE Hard drive in the computer.

    The Box is a 3.0GHZ 2GB of Ram & A 500GB Hard Drive

    Not understanding why this is so slow..

    I was told to use the explode function & try to dump 5,000 lines from the text file than one line at a time. I have to narrow this process from lasting 5++ days to be completed in 6-8 hours maximum.

    Can anyone suggest some code to try & speed this up as much as possible?
     
    expron, Mar 11, 2008 IP
  2. mulari

    mulari Peon

    Messages:
    45
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    There are 3 aspects that might be a problem, long time to actually read the big file, preg_match & sql, i wrote you a simple code that replaces 2 of the 3.

    I didn't test it but I think it should work and improve the speed but I don't know by how much.

    Should replace the code between if ($handle) { and } fclose($handle);

    $domains = array();
    while (!feof($handle))
    {
        $buffer = fgets($handle);
        $tmp = explode(' ', $buffer);
        $domain = $tmp[0] . '.' . $tld;
        $ns = $tmp[2];
        if ($ns[strlen($ns)-1] == '.')
            $ns = substr($ns, 0, -1);
        else
            $ns .= $tld;
    
        $domains[] = "('$domain', '$ns')";
        if (count($domains) == 500)
        {
            mysql_query('INSERT INTO domain VALUES ' . implode(',', $domains));
            $domains = array();
        }
    }
    mysql_query('INSERT INTO domain VALUES ' . implode(',', $domains));
    Code (markup):

    I removed preg_match with a code that might be faster and I combined the insert queries to 500 rows at once, you can even try bigger numbers.

    Hope it will improve the speed, I'll be very interested to know the result.

    Good luck
     
    mulari, Mar 11, 2008 IP
    expron likes this.
  3. expron

    expron Peon

    Messages:
    171
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Wow,

    Thank You Malari! I really appreciate your help. Rep added
     
    expron, Mar 11, 2008 IP
  4. expron

    expron Peon

    Messages:
    171
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    By the way, once I get all this info into the DB, can you recommend a quick way of filtering, searching & sorting the info? There's over 100,000,000 records total.
     
    expron, Mar 11, 2008 IP
  5. CodyRo

    CodyRo Peon

    Messages:
    365
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Depends a lot on the way the database is setup (schema).. are most of the records indexed?
     
    CodyRo, Mar 11, 2008 IP
  6. expron

    expron Peon

    Messages:
    171
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #6
    The records are indexed & there's only 3 fields per record: domain name, nameserver1, nameserver2

    I didn't worry about a schema because of the simplistic structure of the database.
     
    expron, Mar 11, 2008 IP
  7. CodyRo

    CodyRo Peon

    Messages:
    365
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #7
    How do you want to filter/sort the records?
     
    CodyRo, Mar 11, 2008 IP
  8. expron

    expron Peon

    Messages:
    171
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Mulari,

    The script helped a lot but after the database grows to 30+ GB, it crawls. It takes about a minute to insert 20K records. I heard this can be done faster if I avoid PHP & insert directly into MySQL.

    Do you know what the limits MySQL has? I Heard it was 30 million records. I am trying to manage 300 million records.

    Thanks
     
    expron, Mar 14, 2008 IP
  9. InFloW

    InFloW Peon

    Messages:
    1,488
    Likes Received:
    39
    Best Answers:
    0
    Trophy Points:
    0
    #9
    What table type are you using? If you're using the default MyISAM then you could very well be having issues with locking of the table.

    Essentially each time you insert a record MySQL is locking down the table so no other writes can be performed until that one finishes. When you got a large table locking can become a serious issue. I'd look to InnoDB as a possible solution which uses row level locking or other table types from the mysql manual that are better for large datasets.
     
    InFloW, Mar 14, 2008 IP
  10. mulari

    mulari Peon

    Messages:
    45
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I'm happy to hear the script helped :)

    I can help you make a php script that will create text file that you can later import into mysql, but I'm afraid the text file will be HUGE - i'm talking GBs huge. Also, you must have access to mysql command line (shell access).

    But I don't think that it will help you achieve better speeds since your problem now is above PHP.

    You mentioned you have index on all 3 fields, which means that for every insert there are also updates to the index file, and search in them and etc'.. so it's a pretty heavy work - don't forget your DB files are also huge so disk I/O is also starting to matter here.

    I never dealt with such a huge table so don't know about limits or how to improve handling it. Perhaps you should look into InnoDB as InFloW suggested.
     
    mulari, Mar 14, 2008 IP
  11. expron

    expron Peon

    Messages:
    171
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #11
    The database is in MyISAM.

    Here's my .sql file that I created the DB with:

    SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";

    --
    -- Database: `zones`
    --

    -- --------------------------------------------------------

    --
    -- Table structure for table `domain`
    --

    CREATE TABLE IF NOT EXISTS `domain` (
    `domain` varchar(100) NOT NULL,
    `nameserver` varchar(100) NOT NULL,
    KEY `domain` (`domain`,`nameserver`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

    --
    -- Dumping data for table `domain`
    --

    If someone can recommend a good way of searching the DB, it would help. I just want to search for certain name servers. I tried searching the 30GB database & my computer is still frozen since last night.

    Thanks everyone!
     
    expron, Mar 15, 2008 IP
  12. mulari

    mulari Peon

    Messages:
    45
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #12
    You nameserver field is not indexed. It is indexed as part of 'domain', so only when you include both fields the index is used.

    SELECT * FROM domain WHERE nameserver = 'XXX' will result in a full table search.

    Use the following SQL command to add index to nameserver (it might take a while to be done):
    ALTER TABLE `domain` ADD INDEX ( `nameserver` )
     
    mulari, Mar 15, 2008 IP
  13. juust

    juust Peon

    Messages:
    214
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #13
    i read 63 is max for a valid domain name
    (excluding the extension)
    would trimming the field to varchar 70 help any ?
     
    juust, Mar 15, 2008 IP
  14. expron

    expron Peon

    Messages:
    171
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #14
    I tried the 67 characters max & it didn't have much effect. everything still runs slow & is running at the same speed as before.
     
    expron, Mar 17, 2008 IP
  15. juust

    juust Peon

    Messages:
    214
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #15
    just a thought,
    if all you want to do is lookup(nameserver by domain),
    would it work faster if you made 26 tables (domains_a, domains_b, etcetera),
    to keep the tables down to 5-10 million records a piece
    and for search use
    $myquery =
    "select nameserver from domains_".$substr($searchdomain, 0, 1)." where domain='".$searchdomain.'";
     
    juust, Mar 18, 2008 IP
    expron likes this.