Yahoo Google Search Bot

Discussion in 'PHP' started by baddot, Jan 3, 2007.

  1. #1
    hi can anyone tell me how do i do a script or is there a script which can act as a bot to copy the links from the yahoo and google itself into my database automatically ? because im trying to do a search engine in for a few countries like singapore and malaysia only. Anyone have any idea how to do it or start ? its somehow something like a web crawler
     
    baddot, Jan 3, 2007 IP
  2. B.V.E.

    B.V.E. Peon

    Messages:
    106
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Broad question...., broad answer:

    * send search query to Google, + "site:my" (this will make sure only Malaysian websites are displayed)
    * grab the html page with results from Google
    * use some regular expressions or whatever method you'd prefer to extract the results from the html page

    -do the same for Yahoo
    -join the results
    ...-and come up with some algorithm to determine relevance to nicely order the results :p


    Every part of this can be easily realized in PHP, but I doubt it's in appliance with Google's or Yahoo's TOS to scrape their engines with automated systems (at least without using their APIs).
     
    B.V.E., Jan 3, 2007 IP
  3. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
  4. baddot

    baddot Active Member

    Messages:
    309
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    58
    #4
    erm but how do i send search and recieve search ? mysql ? or ?
     
    baddot, Jan 3, 2007 IP
  5. oziman

    oziman Active Member

    Messages:
    199
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    58
    #5
    Basically you want to build a scraper.

    Use Snoopy Class, generate a session to wherever.

    Write a regex that captures the results you're looking for.

    Not too hard, eh ? :)
     
    oziman, Jan 4, 2007 IP
  6. baddot

    baddot Active Member

    Messages:
    309
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    58
    #6
    wao it seems complicated but can guys tell me where should i start at ? or how to start ?
     
    baddot, Jan 8, 2007 IP