1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

removing duplicate lines ..

Discussion in 'Databases' started by DomainMagnate, Jul 6, 2006.

  1. #1
    hey I got a huge db in a txt file and need to remove duplicate lines.
    Any simple suggestions pls?

    like some free application, or a simple script.. :)

    thanks, Mike
     
    DomainMagnate, Jul 6, 2006 IP
  2. SEOEgghead

    SEOEgghead Peon

    Messages:
    18
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Just insert them all into a database, add the constraints afterward. Then SELECT all distinct rows into a new table. Done. In general, SQL is good at these things (better than doing it in a flat file, I mean). Theoretically you can also do it in the file, but that's more susceptible to mistakes. If you want to do that, I'd sort it first if it's a big file, then the algorithm is obvious.

    Regards,
    J.
     
    SEOEgghead, Jul 6, 2006 IP
    DomainMagnate likes this.
  3. jnestor

    jnestor Peon

    Messages:
    133
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #3
    On linux/unix boxes:
    cat {filename} | sort | uniq > {newfilename}

    That only works if they're truely duplicate lines. If it's something with a duplicate field you can possibly do it with linux tools but I'd go with a variation of the above.

    Create table with appropriate unique index
    mysqlimport --ignore

    That's the way to do it if the uniqueness constraint will exist going forward.
     
    jnestor, Jul 7, 2006 IP
    DomainMagnate likes this.
  4. DomainMagnate

    DomainMagnate Illustrious Member

    Messages:
    10,932
    Likes Received:
    1,022
    Best Answers:
    0
    Trophy Points:
    455
    #4
    mmm thanks, but I'm on windows.. may be it can be done in excel, or if you know a free applicaion that can do it. .:)
     
    DomainMagnate, Jul 7, 2006 IP
  5. danielbruzual

    danielbruzual Active Member

    Messages:
    906
    Likes Received:
    57
    Best Answers:
    0
    Trophy Points:
    70
    #5
    yes you can do it in excel. let's say you have the following info:

    name|age|height|lettter
    bac | 123 | 150 | c
    kks | 351 | 112 | m
    cab | 125 | 150 | c
    bac | 123 | 150 | c

    select everything with your mouse (including the header, which in this case would be from A1 to D5) and go to Data>Filter>Advanced Filter>Unique records only> Ok
    the result will be (because the first and last records are the same one of them is deleted.):

    name|age|height|lettter
    kks | 351 | 112 | m
    cab | 125 | 150 | c
    bac | 123 | 150 | c

    there you go. :)
     
    danielbruzual, Jul 7, 2006 IP
    DomainMagnate likes this.
  6. DomainMagnate

    DomainMagnate Illustrious Member

    Messages:
    10,932
    Likes Received:
    1,022
    Best Answers:
    0
    Trophy Points:
    455
    #6
    thanks daniel..
     
    DomainMagnate, Jul 7, 2006 IP
  7. DomainMagnate

    DomainMagnate Illustrious Member

    Messages:
    10,932
    Likes Received:
    1,022
    Best Answers:
    0
    Trophy Points:
    455
    #7
    hey! I found a cool program, called editpad pro.

    its shareware, but all needed features work, so if anyone else needs it, hope it helps :)
     
    DomainMagnate, Jul 8, 2006 IP