BIG Unique ARRAY make server crash

Discussion in 'PHP' started by goscript, Jul 26, 2007.

  1. #1
    Here's the situation:

    A text file with 14 million records, one per line.
    Problem: some of the records are duplicate.

    Sample records:

    rmocu6zza0
    gbl239serr
    iq090qctnp
    dfimp7oopw
    y2gcsstbiw
    x9owfugbn8
    jg9hqx6d0c
    etc..

    File size is round 200 Mb

    I need to remove the duplicates from the text file and import them into a db table.
    I tried reading the whole file into an array and use the array_unique function, but that made the server crash on such big array.

    Any way of doing this easier?

    Thanks
     
    goscript, Jul 26, 2007 IP
  2. ecentricNick

    ecentricNick Peon

    Messages:
    351
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Split the file into sizes less than 2mb big.

    Import each file into a table, with a unique index on the relavent column(s), telling the import to ignore duplicate records.

    Export the results.

    Voila, deduped.
     
    ecentricNick, Jul 26, 2007 IP
  3. goscript

    goscript Prominent Member

    Messages:
    2,753
    Likes Received:
    306
    Best Answers:
    0
    Trophy Points:
    315
    #3
    Thanks, I'll do this way
     
    goscript, Jul 26, 2007 IP