how to collect all links from web page without using file_get

ramysarwat Peon

Messages:: 164

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

how to collect all links from web page without using file_get_contents ?

is there any way to get the web page source without using file_get_contents ?

ramysarwat, Oct 28, 2009 IP

Darkhodge Well-Known Member

Messages:: 2,111

Likes Received:: 76

Best Answers:: 1

Trophy Points:: 185

#2

You can use cURL to pull data and store it into a variable then use regular expressions to get all the relevant data. Here's some resources that are related to these fields:

cURL

preg_match_all

Regular Expressions

This site explains in more detail the process of scraping the links using regex and preg_match_all

I only skimmed quickly though the last link but one thing I noticed in the last link is that they use file_get_contents, which you said you don't want to use. You can replace that section with the cURL. For example:
// Set resource url to "grab"
$resourceURL = "http://www.google.com";

// Set useragent
$useragent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13";

// Create a new curl resource
$ch = curl_init();

// Set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $resourceURL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);

// Grab URL
$grabPage = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);
PHP:
Hope that helps somewhat! Now back to the work I'm meant to be doing!

Hodge

Darkhodge, Oct 28, 2009 IP

AsHinE Well-Known Member

Messages:: 240

Likes Received:: 8

Best Answers:: 1

Trophy Points:: 138

#3

Also you can use sockets to retrieve page contents.

AsHinE, Oct 28, 2009 IP

ramysarwat Peon

Messages:: 164

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

AsHinE said: ↑

Also you can use sockets to retrieve page contents.
Click to expand...

is there any example how to do that using sockets

ramysarwat, Oct 28, 2009 IP

ramysarwat Peon

Messages:: 164

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

thank you Darkhodge but how to collect links from $grabPage ?

ramysarwat, Oct 28, 2009 IP

AsHinE Well-Known Member

Messages:: 240

Likes Received:: 8

Best Answers:: 1

Trophy Points:: 138

#6

From php manual:


<?php
$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30);
if (!$fp) {
    echo "$errstr ($errno)<br />\n";
} else {
    $out = "GET / HTTP/1.1\r\n";
    $out .= "Host: www.example.com\r\n";
    $out .= "Connection: Close\r\n\r\n";

    fwrite($fp, $out);
    while (!feof($fp)) {
        echo fgets($fp, 128);
    }
    fclose($fp);
}
?>

PHP:

AsHinE, Oct 29, 2009 IP

Log in or Sign up

how to collect all links from web page without using file_get_contents ?

ramysarwat Peon

Darkhodge Well-Known Member

AsHinE Well-Known Member

ramysarwat Peon

ramysarwat Peon

AsHinE Well-Known Member

Useful Searches