Hello, I am experimenting with the TLD Zone Files. These files are 5.9 GB each when extracted. It is nearly taking me 7 full days for this MySQL dump to finish. I need to minimize this into 6-8 hours. Here's all the details: Here is a sample data of the 5.9 GB file line by line: WESTERN-PHOTO NS NS2.OCO.NET. PILOT-TECH NS NS1.FATCOW PILOT-TECH NS NS2.FATCOW NETCASTER NS NS75.WORLDNIC NETCASTER NS NS76.WORLDNIC CAR-NECTION NS NS.CIHOST CAR-NECTION NS NS2.CIHOST INTEGRATEDCONCEPTS NS NS1.SECURE.NET. INTEGRATEDCONCEPTS NS NS2.SECURE.NET. PASQUALE NS NS-00.SHOUT.NET. PASQUALE NS NS-01.SHOUT.NET. SLCTNET NS ENTERPRISE.SELECT.NET. SLCTNET NS EXCELCIOR.SELECT.NET. SATSOFT NS NS.HIS SATSOFT NS NS2.HIS SATSOFT NS NS3.HIS SATSOFT NS NS4.HIS SPINNERET NS NS1.MAXEN.NET. SPINNERET NS NS.MAXEN.NET. MRDAN NS NS1.SECURE.NET. MRDAN NS NS2.SECURE.NET. MGREEN NS NS.NEO.NET. MGREEN NS NS2.NEO.NET. ALLIANT NS DBRU.BR.NS.ELS-GMS.ATT.NET. ALLIANT NS DMTU.MT.NS.ELS-GMS.ATT.NET. CHECKTRANS NS NS1.DARGAL CHECKTRANS NS NS1.DARGALSOLUTIONS FORMANCO NS NS1.LAMEDELEGATION.NET. FORMANCO NS NS2.LAMEDELEGATION.NET. ELECTRONIC-PUB NS NS1.PENNWELL ELECTRONIC-PUB NS NS2.PENNWELL ELECTRONIC-PUB NS NS3.PENNWELL MICROLITHO-WORLD NS WCSGATE.WILCOM MICROLITHO-WORLD NS NS2.WILCOM SUNHING NS NS51.WORLDNIC SUNHING NS NS52.WORLDNIC DYNAFLOW-INC NS NS3.BROADWING.NET. DYNAFLOW-INC NS NS4.BROADWING.NET. IMGN-IT NS NS5.DNSSERVER8 IMGN-IT NS NS4.DNSSERVER8 Here is the PHP code I made to import into the DB: <?php //These lines are user configurable, change them to your liking mysql_connect("localhost", "root", "") or die(mysql_error()); mysql_select_db("zones") or die(mysql_error()); // change the TLD to whatever we're going to read from this file $tld = ".COM"; // Filename to read domains from. $handle = fopen("zones.txt", "r"); // Leave everything below here alone if ($handle) { while (!feof($handle)) { $buffer = fgets($handle); if (preg_match("/^(.*) NS (.*)\.$/", $buffer, $matches)) { $domain = $matches[1] . $tld; $ns = $matches[2]; } elseif (preg_match("/^(.*) NS (.*)$/", $buffer, $matches)) { $domain = $matches[1] . $tld; $ns = $matches[2] . $tld; } mysql_query("INSERT INTO domain VALUES('$domain', '$ns')") or print(mysql_error()); } fclose($handle); } ?> But the above code is taking over 6 days to import into a MySQL DB. The 5.9GB text file & mySQL server XAMPP are on the same box & the MySQL & 5.9GB file are on ths same IDE Hard drive in the computer. The Box is a 3.0GHZ 2GB of Ram & A 500GB Hard Drive Not understanding why this is so slow.. I was told to use the explode function & try to dump 5,000 lines from the text file than one line at a time. I have to narrow this process from lasting 5++ days to be completed in 6-8 hours maximum. Can anyone suggest some code to try & speed this up as much as possible?
There are 3 aspects that might be a problem, long time to actually read the big file, preg_match & sql, i wrote you a simple code that replaces 2 of the 3. I didn't test it but I think it should work and improve the speed but I don't know by how much. Should replace the code between if ($handle) { and } fclose($handle); $domains = array(); while (!feof($handle)) { $buffer = fgets($handle); $tmp = explode(' ', $buffer); $domain = $tmp[0] . '.' . $tld; $ns = $tmp[2]; if ($ns[strlen($ns)-1] == '.') $ns = substr($ns, 0, -1); else $ns .= $tld; $domains[] = "('$domain', '$ns')"; if (count($domains) == 500) { mysql_query('INSERT INTO domain VALUES ' . implode(',', $domains)); $domains = array(); } } mysql_query('INSERT INTO domain VALUES ' . implode(',', $domains)); Code (markup): I removed preg_match with a code that might be faster and I combined the insert queries to 500 rows at once, you can even try bigger numbers. Hope it will improve the speed, I'll be very interested to know the result. Good luck
By the way, once I get all this info into the DB, can you recommend a quick way of filtering, searching & sorting the info? There's over 100,000,000 records total.
The records are indexed & there's only 3 fields per record: domain name, nameserver1, nameserver2 I didn't worry about a schema because of the simplistic structure of the database.
Mulari, The script helped a lot but after the database grows to 30+ GB, it crawls. It takes about a minute to insert 20K records. I heard this can be done faster if I avoid PHP & insert directly into MySQL. Do you know what the limits MySQL has? I Heard it was 30 million records. I am trying to manage 300 million records. Thanks
What table type are you using? If you're using the default MyISAM then you could very well be having issues with locking of the table. Essentially each time you insert a record MySQL is locking down the table so no other writes can be performed until that one finishes. When you got a large table locking can become a serious issue. I'd look to InnoDB as a possible solution which uses row level locking or other table types from the mysql manual that are better for large datasets.
I'm happy to hear the script helped I can help you make a php script that will create text file that you can later import into mysql, but I'm afraid the text file will be HUGE - i'm talking GBs huge. Also, you must have access to mysql command line (shell access). But I don't think that it will help you achieve better speeds since your problem now is above PHP. You mentioned you have index on all 3 fields, which means that for every insert there are also updates to the index file, and search in them and etc'.. so it's a pretty heavy work - don't forget your DB files are also huge so disk I/O is also starting to matter here. I never dealt with such a huge table so don't know about limits or how to improve handling it. Perhaps you should look into InnoDB as InFloW suggested.
The database is in MyISAM. Here's my .sql file that I created the DB with: SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO"; -- -- Database: `zones` -- -- -------------------------------------------------------- -- -- Table structure for table `domain` -- CREATE TABLE IF NOT EXISTS `domain` ( `domain` varchar(100) NOT NULL, `nameserver` varchar(100) NOT NULL, KEY `domain` (`domain`,`nameserver`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; -- -- Dumping data for table `domain` -- If someone can recommend a good way of searching the DB, it would help. I just want to search for certain name servers. I tried searching the 30GB database & my computer is still frozen since last night. Thanks everyone!
You nameserver field is not indexed. It is indexed as part of 'domain', so only when you include both fields the index is used. SELECT * FROM domain WHERE nameserver = 'XXX' will result in a full table search. Use the following SQL command to add index to nameserver (it might take a while to be done): ALTER TABLE `domain` ADD INDEX ( `nameserver` )
i read 63 is max for a valid domain name (excluding the extension) would trimming the field to varchar 70 help any ?
I tried the 67 characters max & it didn't have much effect. everything still runs slow & is running at the same speed as before.
just a thought, if all you want to do is lookup(nameserver by domain), would it work faster if you made 26 tables (domains_a, domains_b, etcetera), to keep the tables down to 5-10 million records a piece and for search use $myquery = "select nameserver from domains_".$substr($searchdomain, 0, 1)." where domain='".$searchdomain.'";