Need help in storing data

Discussion in 'Programming' started by ichkoguy, Jan 20, 2009.

  1. #1
    First of all i want to know how can i save a web page in my system. Do not say that click save as and save in the text format.

    My idea is to first download the source code of a specified URL and then parse all the tags and retrieve only the text portion of that url. Initially i thought that i save the retrieved text as a table in a database. But my idea would be foolish to save all the retrieved text in a table. Then the database needed will be huge if there are 1000 web links.

    So i wanted to know whether is there any way to download the source code, stirp off tags and store the retrieved content in a table or as a text file. Which will be faster?

    I hope you must have understood that i am constructing a search engine here. Well i need some ideas on this part!!!. Thank you guys.
     
    ichkoguy, Jan 20, 2009 IP
  2. NinjaWork

    NinjaWork Guest

    Messages:
    132
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
  3. Lavinco

    Lavinco Peon

    Messages:
    383
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Maybe Happy Harvester would help in retrieving the info correctly.
     
    Lavinco, Jan 20, 2009 IP
  4. NinjaWork

    NinjaWork Guest

    Messages:
    132
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    There's a lot of ways to reduce the data...you can calculate word frequency, stem the words, make indexes, etc. It's a fun project. I once spent a week on trying to write a search engine, though, and gave up since I realized it was endless! I wish you better luck :)
     
    NinjaWork, Jan 20, 2009 IP