Getting the urls list from YAhoo linkdomain:domain.com -site:domain.com

Discussion in 'PHP' started by jigen7, Sep 20, 2007.

  1. #1
    how can i get the list of urls that yahoo generate for example i use linkdomain:mahq.net -site:mahq.net then yahoo will yield 5610 sites so how can i get those urls to be stored in an array using php??? thx
     
    jigen7, Sep 20, 2007 IP
  2. krt

    krt Well-Known Member

    Messages:
    829
    Likes Received:
    38
    Best Answers:
    0
    Trophy Points:
    120
    #2
    Parsing the contents obtained through file handling functions, e.g. file_get_contents() (which may not work on certain hosts due to security settings, namely allow_url_fopen), or cURL. Make sure you are not breaking any terms of use though.

    The process is called data scraping and of you search enough, you may find an example doing the same thing you are.
     
    krt, Sep 20, 2007 IP
  3. jigen7

    jigen7 Peon

    Messages:
    42
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    well yes i already did Curl on it but i think(correct me if im wrong) curl only scrap 1 page at a time so i cant get all the 5610 urls for example how can i traverse between the pages of it to get all of the url?
     
    jigen7, Sep 20, 2007 IP
  4. msaqibansari

    msaqibansari Peon

    Messages:
    84
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #4
    There is no option to get all of your required urls from yahoo. Yahoo serve results page by page and you can not get all in one.
     
    msaqibansari, Sep 20, 2007 IP
  5. jigen7

    jigen7 Peon

    Messages:
    42
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    ok thx for the information i reaaly cant think of a way CURL can get all the URL list by yahoo search... thx the alternative way is using yahoo api
     
    jigen7, Sep 21, 2007 IP
  6. krt

    krt Well-Known Member

    Messages:
    829
    Likes Received:
    38
    Best Answers:
    0
    Trophy Points:
    120
    #6
    You'd have to make as many calls to cURL but that many requests in a short period will trigger some form of flood control. If you can do this through the API, that would be a much better option.
     
    krt, Sep 21, 2007 IP