I've got a ton of html files that I'd like to use to populate a database. Each html page contains several html tables with data. I've been trying to use a convoluted mix of sed and nawk to parse the files to pull what I want from them, but I'm thinking there must be an easier way. Does anyone know of any scripts or programs that I can use to process these files so that I can pull the information that I want and have it output as a tab separated text file? I can handle the insertion into the db once I make the text files. Many thanks!
i think you will have to pick the tables out of these html files manually :-/....or maybe you can write some php code which puts everything between <table> and </table> into a txt file.
I'm not sure if there is a tool to do this. You may be right by creating sed and awk scripts. check sourceforge.net and freshmeat.net for freeware and shareware.