hey I got a huge db in a txt file and need to remove duplicate lines. Any simple suggestions pls? like some free application, or a simple script.. thanks, Mike
Just insert them all into a database, add the constraints afterward. Then SELECT all distinct rows into a new table. Done. In general, SQL is good at these things (better than doing it in a flat file, I mean). Theoretically you can also do it in the file, but that's more susceptible to mistakes. If you want to do that, I'd sort it first if it's a big file, then the algorithm is obvious. Regards, J.
On linux/unix boxes: cat {filename} | sort | uniq > {newfilename} That only works if they're truely duplicate lines. If it's something with a duplicate field you can possibly do it with linux tools but I'd go with a variation of the above. Create table with appropriate unique index mysqlimport --ignore That's the way to do it if the uniqueness constraint will exist going forward.
mmm thanks, but I'm on windows.. may be it can be done in excel, or if you know a free applicaion that can do it. .
yes you can do it in excel. let's say you have the following info: name|age|height|lettter bac | 123 | 150 | c kks | 351 | 112 | m cab | 125 | 150 | c bac | 123 | 150 | c select everything with your mouse (including the header, which in this case would be from A1 to D5) and go to Data>Filter>Advanced Filter>Unique records only> Ok the result will be (because the first and last records are the same one of them is deleted.): name|age|height|lettter kks | 351 | 112 | m cab | 125 | 150 | c bac | 123 | 150 | c there you go.
hey! I found a cool program, called editpad pro. its shareware, but all needed features work, so if anyone else needs it, hope it helps