removing duplicate lines ..

DomainMagnate Illustrious Member

Messages:: 10,932

Likes Received:: 1,022

Best Answers:: 0

Trophy Points:: 455

#1

hey I got a huge db in a txt file and need to remove duplicate lines.
Any simple suggestions pls?

like some free application, or a simple script..

thanks, Mike

DomainMagnate, Jul 6, 2006 IP

SEOEgghead Peon

Messages:: 18

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#2

Just insert them all into a database, add the constraints afterward. Then SELECT all distinct rows into a new table. Done. In general, SQL is good at these things (better than doing it in a flat file, I mean). Theoretically you can also do it in the file, but that's more susceptible to mistakes. If you want to do that, I'd sort it first if it's a big file, then the algorithm is obvious.

Regards,
J.

SEOEgghead, Jul 6, 2006 IP

DomainMagnate likes this.

jnestor Peon

Messages:: 133

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#3

On linux/unix boxes:
cat {filename} | sort | uniq > {newfilename}

That only works if they're truely duplicate lines. If it's something with a duplicate field you can possibly do it with linux tools but I'd go with a variation of the above.

Create table with appropriate unique index
mysqlimport --ignore

That's the way to do it if the uniqueness constraint will exist going forward.

jnestor, Jul 7, 2006 IP

DomainMagnate likes this.

DomainMagnate Illustrious Member

Messages:: 10,932

Likes Received:: 1,022

Best Answers:: 0

Trophy Points:: 455

#4

mmm thanks, but I'm on windows.. may be it can be done in excel, or if you know a free applicaion that can do it. .

DomainMagnate, Jul 7, 2006 IP

danielbruzual Active Member

Messages:: 906

Likes Received:: 57

Best Answers:: 0

Trophy Points:: 70

#5

yes you can do it in excel. let's say you have the following info:

name|age|height|lettter
bac | 123 | 150 | c
kks | 351 | 112 | m
cab | 125 | 150 | c
bac | 123 | 150 | c

select everything with your mouse (including the header, which in this case would be from A1 to D5) and go to Data>Filter>Advanced Filter>Unique records only> Ok
the result will be (because the first and last records are the same one of them is deleted.):

name|age|height|lettter
kks | 351 | 112 | m
cab | 125 | 150 | c
bac | 123 | 150 | c

there you go.

danielbruzual, Jul 7, 2006 IP

DomainMagnate likes this.

DomainMagnate Illustrious Member

Messages:: 10,932

Likes Received:: 1,022

Best Answers:: 0

Trophy Points:: 455

#6

thanks daniel..

DomainMagnate, Jul 7, 2006 IP

DomainMagnate Illustrious Member

Messages:: 10,932

Likes Received:: 1,022

Best Answers:: 0

Trophy Points:: 455

#7

hey! I found a cool program, called editpad pro.

its shareware, but all needed features work, so if anyone else needs it, hope it helps

DomainMagnate, Jul 8, 2006 IP

Log in or Sign up

removing duplicate lines ..

DomainMagnate Illustrious Member

SEOEgghead Peon

jnestor Peon

DomainMagnate Illustrious Member

danielbruzual Active Member

DomainMagnate Illustrious Member

DomainMagnate Illustrious Member

Useful Searches