Search engine

Discussion in 'Programming' started by soudip, Mar 9, 2008.

  1. #1
    hi community!!!
    i am doing an academic project in which i am supposed to develop a search engine for a sharepoint website...
    i tried googling for some information on how to proceed with developing a search engine but cud not get anything concrete there...
    As per the knowledge i gaind whil surfing...
    i think PERL will be apt for me to develop this search engine..
    hey community!!can u advise me how can i start off???
    please reply..In DESPAIR....Nearing deadlineeee!!!!!!!!!!!!!!!!!!!!
     
    soudip, Mar 9, 2008 IP
  2. firmaterra

    firmaterra Peon

    Messages:
    756
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Are you for real??? You're near your deadline and want to develop a search engine?? haha okay here ya go:

    1. Generate a list of URLS to beging your crawl
    2. Crawl this list of URLS, downloading them and caching them
    3. Parse these downloaded URLS for links and use these links as the next set of data to crawl and fetch
    4. index all downloaded pages so they can be searched quickly for keywords
    5. build a front face so that users can enter the search term they require.

    Whilst possible to code in perl, I think the speed as to what you want to do will become so slow after a few million pages. You need a more grass roots language like python, Java or c++ to name a few.
     
    firmaterra, Mar 11, 2008 IP
  3. shallowink

    shallowink Well-Known Member

    Messages:
    1,218
    Likes Received:
    64
    Best Answers:
    2
    Trophy Points:
    150
    #3
    SE for a sharepoint webiste. Look for perl search scripts. Site indexers would be a good starting point and relatively simple. Start @ / and parse HTML for all A links. Can use HTML::parse modules, several methods are available to extract URLs.
     
    shallowink, Mar 11, 2008 IP