Downloading Web Pages....Using PHP ((Novice User Headache warning ;- ))))

udkl_12_98 Banned

Messages:: 307

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#1

Well... Searched for it long over the net.. No Result...

Its just that I am working on a new Product review site , and for that I needed the Product Details.....And what Better site to get them Other than Wikipedia ???

Well... But I cant Figure a work about ... Wikipedia Provides ussers with an XML format specifically for such uses... So the URL should be something like ... http://en.wikipedia.org/wiki/Special:Export/Fight_Club_(film)

Which I want to Download on the Server using some PHP Code (((WHICH I DONT KNOW , And here lies the problem !!!!)))...and after that , Manipulate the XML file (((This part I can Do)))

PS: It might sound Dumb, But I used a require_once() function with the url inside it.... and it gave me a """failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden""" error

Conclusion :-

Please Can Anybody tell me one or more PHP functions that can download the web pages ON THE SERVER Side.....

udkl_12_98, Jun 3, 2007 IP

speda1 Well-Known Member

Messages:: 374

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 108

#2

The CURL functions can be used to fetch text from a url. But it sounds like the 403 means that they are blocking you from borrowing their content.

speda1, Jun 3, 2007 IP

Vbot Peon

Messages:: 107

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#3

Why not use:
$data = file_get_contents("target url here");

$data will become html or xml of the page you're trying to get.

Vbot, Jun 3, 2007 IP

udkl_12_98 Banned

Messages:: 307

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#4

Thanks for the reply vbot and speda1

Vbot :- I tred the method , but It dosent work ... Instead , I think It's Something to do with Socket Programming.....

Something like

$fp = fsockopen("wikipedia.org", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: wikipedia.org\r\n";
$out .= "Connection: Close\r\n\r\n";

fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}

This works , but partially.....

Conclusion :-

Its basically got to do with Socket Progrraming , And I am REsearching deeper into it , If anybody has any More details , Please Hit me ....

udkl_12_98, Jun 3, 2007 IP

ZenOswyn Peon

Messages:: 50

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#5

You don't need to delve into sockets if you just want to retrieve an XML file.

php.net/curl will be all you need, and you don't have to mess with sockets.

ZenOswyn, Jun 3, 2007 IP

udkl_12_98 Banned

Messages:: 307

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#6

Well , Hey , Guess What ??
The file_get_contents() function stated by Vbot does work ....

I was trying it on the url "http://en.wikipedia.org/wiki/Special:Export/Fight_Club_(film)" NOTICE THE SPECIAL:EXPORT thing??....That was the problem .... But after I converted it to "http://en.wikipedia.org/wiki/Fight_Club_(film)" It worked like Magic.... Please Can Anybody tell Me how to make it work with the ":" in 'special:export ???

I tried urlencode() and rawurlencode() ... but they dont seem to work???

udkl_12_98, Jun 3, 2007 IP

Vbot Peon

Messages:: 107

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#7

Yeah you can use curl lik ZenOswyn said, but if you want to use fsockopen then here you go.

$link = "http://en.wikipedia.org/wiki/Special:Export/Fight_Club_(film)";
$url = parse_url($link);
$fp = fsockopen($url['host'], 80, $errno, $errstr, 30);
if (!$fp)
{
    echo "$errstr ($errno)<br />\n";
}
else
{
    $head = "GET {$url['path']} HTTP/1.1\r\n";
    $head .= "Host: {$url['host']}\r\n";
    $head .= "User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)\r\n";
    $head .= "Connection: Close\r\n\r\n";
    fwrite($fp, $head);
    while (!feof($fp))
    {
        echo fgets($fp, 128);
    }
    fclose($fp);
}

PHP:

Vbot, Jun 4, 2007 IP

udkl_12_98 Banned

Messages:: 307

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#8

Thanks A Lot Vbot , That Helped me , ur Knowledge is good....

udkl_12_98, Jun 4, 2007 IP

Log in or Sign up

Downloading Web Pages....Using PHP ((Novice User Headache warning ;- ))))

udkl_12_98 Banned

speda1 Well-Known Member

Vbot Peon

udkl_12_98 Banned

ZenOswyn Peon

udkl_12_98 Banned

Vbot Peon

udkl_12_98 Banned

Useful Searches