Here's the situation: A text file with 14 million records, one per line. Problem: some of the records are duplicate. Sample records: rmocu6zza0 gbl239serr iq090qctnp dfimp7oopw y2gcsstbiw x9owfugbn8 jg9hqx6d0c etc.. File size is round 200 Mb I need to remove the duplicates from the text file and import them into a db table. I tried reading the whole file into an array and use the array_unique function, but that made the server crash on such big array. Any way of doing this easier? Thanks
Split the file into sizes less than 2mb big. Import each file into a table, with a unique index on the relavent column(s), telling the import to ignore duplicate records. Export the results. Voila, deduped.