First of all i want to know how can i save a web page in my system. Do not say that click save as and save in the text format. My idea is to first download the source code of a specified URL and then parse all the tags and retrieve only the text portion of that url. Initially i thought that i save the retrieved text as a table in a database. But my idea would be foolish to save all the retrieved text in a table. Then the database needed will be huge if there are 1000 web links. So i wanted to know whether is there any way to download the source code, stirp off tags and store the retrieved content in a table or as a text file. Which will be faster? I hope you must have understood that i am constructing a search engine here. Well i need some ideas on this part!!!. Thank you guys.
There's a lot of ways to reduce the data...you can calculate word frequency, stem the words, make indexes, etc. It's a fun project. I once spent a week on trying to write a search engine, though, and gave up since I realized it was endless! I wish you better luck