removing duplicate lines ..

Discussion in 'Databases' started by DomainMagnate, Jul 6, 2006.

  1. #1
    hey I got a huge db in a txt file and need to remove duplicate lines.
    Any simple suggestions pls?

    like some free application, or a simple script.. :)

    thanks, Mike
     
    DomainMagnate, Jul 6, 2006 IP
  2. SEOEgghead

    SEOEgghead Peon

    Messages:
    18
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Just insert them all into a database, add the constraints afterward. Then SELECT all distinct rows into a new table. Done. In general, SQL is good at these things (better than doing it in a flat file, I mean). Theoretically you can also do it in the file, but that's more susceptible to mistakes. If you want to do that, I'd sort it first if it's a big file, then the algorithm is obvious.

    Regards,
    J.
     
    SEOEgghead, Jul 6, 2006 IP
    DomainMagnate likes this.
  3. jnestor

    jnestor Peon

    Messages:
    133
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #3
    On linux/unix boxes:
    cat {filename} | sort | uniq > {newfilename}

    That only works if they're truely duplicate lines. If it's something with a duplicate field you can possibly do it with linux tools but I'd go with a variation of the above.

    Create table with appropriate unique index
    mysqlimport --ignore

    That's the way to do it if the uniqueness constraint will exist going forward.
     
    jnestor, Jul 7, 2006 IP
    DomainMagnate likes this.
  4. DomainMagnate

    DomainMagnate Illustrious Member

    Messages:
    10,932
    Likes Received:
    1,022
    Best Answers:
    0
    Trophy Points:
    455
    #4
    mmm thanks, but I'm on windows.. may be it can be done in excel, or if you know a free applicaion that can do it. .:)
     
    DomainMagnate, Jul 7, 2006 IP
  5. danielbruzual

    danielbruzual Active Member

    Messages:
    906
    Likes Received:
    57
    Best Answers:
    0
    Trophy Points:
    70
    #5
    yes you can do it in excel. let's say you have the following info:

    name|age|height|lettter
    bac | 123 | 150 | c
    kks | 351 | 112 | m
    cab | 125 | 150 | c
    bac | 123 | 150 | c

    select everything with your mouse (including the header, which in this case would be from A1 to D5) and go to Data>Filter>Advanced Filter>Unique records only> Ok
    the result will be (because the first and last records are the same one of them is deleted.):

    name|age|height|lettter
    kks | 351 | 112 | m
    cab | 125 | 150 | c
    bac | 123 | 150 | c

    there you go. :)
     
    danielbruzual, Jul 7, 2006 IP
    DomainMagnate likes this.
  6. DomainMagnate

    DomainMagnate Illustrious Member

    Messages:
    10,932
    Likes Received:
    1,022
    Best Answers:
    0
    Trophy Points:
    455
    #6
    thanks daniel..
     
    DomainMagnate, Jul 7, 2006 IP
  7. DomainMagnate

    DomainMagnate Illustrious Member

    Messages:
    10,932
    Likes Received:
    1,022
    Best Answers:
    0
    Trophy Points:
    455
    #7
    hey! I found a cool program, called editpad pro.

    its shareware, but all needed features work, so if anyone else needs it, hope it helps :)
     
    DomainMagnate, Jul 8, 2006 IP