Downloading Web Pages....Using PHP ((Novice User Headache warning ;- ))))

Discussion in 'PHP' started by udkl_12_98, Jun 3, 2007.

  1. #1
    Well... Searched for it long over the net.. No Result...

    Its just that I am working on a new Product review site , and for that I needed the Product Details.....And what Better site to get them Other than Wikipedia ???

    Well... But I cant Figure a work about ... Wikipedia Provides ussers with an XML format specifically for such uses... So the URL should be something like ... http://en.wikipedia.org/wiki/Special:Export/Fight_Club_(film)

    Which I want to Download on the Server using some PHP Code (((WHICH I DONT KNOW , And here lies the problem !!!!)))...and after that , Manipulate the XML file (((This part I can Do)))

    PS: It might sound Dumb, But I used a require_once() function with the url inside it.... and it gave me a """failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden""" error



    Conclusion :-

    Please Can Anybody tell me one or more PHP functions that can download the web pages ON THE SERVER Side.....

    ;)
     
    udkl_12_98, Jun 3, 2007 IP
  2. speda1

    speda1 Well-Known Member

    Messages:
    374
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    108
    #2
    The CURL functions can be used to fetch text from a url. But it sounds like the 403 means that they are blocking you from borrowing their content.
     
    speda1, Jun 3, 2007 IP
  3. Vbot

    Vbot Peon

    Messages:
    107
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Why not use:
    $data = file_get_contents("target url here");

    $data will become html or xml of the page you're trying to get.
     
    Vbot, Jun 3, 2007 IP
  4. udkl_12_98

    udkl_12_98 Banned

    Messages:
    307
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks for the reply vbot and speda1

    Vbot :- I tred the method , but It dosent work ... Instead , I think It's Something to do with Socket Programming.....

    Something like

    $fp = fsockopen("wikipedia.org", 80, $errno, $errstr, 30);
    if (!$fp) {
    echo "$errstr ($errno)<br />\n";
    } else {
    $out = "GET / HTTP/1.1\r\n";
    $out .= "Host: wikipedia.org\r\n";
    $out .= "Connection: Close\r\n\r\n";

    fwrite($fp, $out);
    while (!feof($fp)) {
    echo fgets($fp, 128);
    }
    fclose($fp);
    }


    This works , but partially.....

    Conclusion :-

    Its basically got to do with Socket Progrraming , And I am REsearching deeper into it , If anybody has any More details , Please Hit me ....
     
    udkl_12_98, Jun 3, 2007 IP
  5. ZenOswyn

    ZenOswyn Peon

    Messages:
    50
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    You don't need to delve into sockets if you just want to retrieve an XML file.

    php.net/curl will be all you need, and you don't have to mess with sockets.
     
    ZenOswyn, Jun 3, 2007 IP
  6. udkl_12_98

    udkl_12_98 Banned

    Messages:
    307
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Well , Hey , Guess What ??
    The file_get_contents() function stated by Vbot does work ....

    I was trying it on the url "http://en.wikipedia.org/wiki/Special:Export/Fight_Club_(film)" NOTICE THE SPECIAL:EXPORT thing??....That was the problem .... But after I converted it to "http://en.wikipedia.org/wiki/Fight_Club_(film)" It worked like Magic.... Please Can Anybody tell Me how to make it work with the ":" in 'special:export ???

    I tried urlencode() and rawurlencode() ... but they dont seem to work???
     
    udkl_12_98, Jun 3, 2007 IP
  7. Vbot

    Vbot Peon

    Messages:
    107
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Yeah you can use curl lik ZenOswyn said, but if you want to use fsockopen then here you go.

    $link = "http://en.wikipedia.org/wiki/Special:Export/Fight_Club_(film)";
    $url = parse_url($link);
    $fp = fsockopen($url['host'], 80, $errno, $errstr, 30);
    if (!$fp)
    {
        echo "$errstr ($errno)<br />\n";
    }
    else
    {
        $head = "GET {$url['path']} HTTP/1.1\r\n";
        $head .= "Host: {$url['host']}\r\n";
        $head .= "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)\r\n";
        $head .= "Connection: Close\r\n\r\n";
        fwrite($fp, $head);
        while (!feof($fp))
        {
            echo fgets($fp, 128);
        }
        fclose($fp);
    }
    PHP:
     
    Vbot, Jun 4, 2007 IP
  8. udkl_12_98

    udkl_12_98 Banned

    Messages:
    307
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Thanks A Lot Vbot , That Helped me , ur Knowledge is good....
     
    udkl_12_98, Jun 4, 2007 IP