HELP! Modify php scraper to use with proxies

Discussion in 'PHP' started by chevchelios, Dec 17, 2011.

  1. #1
    Hi, I made this simple script:
    <form action="scraper.php" method="post">
    Name: <input type="text" name="fname" />
    <input type="submit" />
    </form>
    Code (markup):
    <?
    
    $url = $_POST["fname"];
    $pattern = "/(\s)/"; 
    $url = preg_replace($pattern,"%20",$url);
    $url= 'http://www.google.com/search?q='.$url.'&hl=en&biw=1920&bih=989&num=100&lr=&ft=i&cr=&safe=images&tbs=';
    
    $m= file_get_contents ($url);
    
    preg_match_all('/<h3 class="r">.*<\/h3>/Usi', $m, $temp);
    var_dump($temp[0]); 
    ?>
    Code (markup):
    CAN SOMEONE MODIFY THIS SCRIPT AND MAKE IT ABLE TO USE A LIST OF PROXIES EVERY TIME A QUERY IS MADE ?
     
    chevchelios, Dec 17, 2011 IP
  2. ilook

    ilook Well-Known Member

    Messages:
    1,602
    Likes Received:
    15
    Best Answers:
    1
    Trophy Points:
    165
    #2
    If you can't get it to work and alternatively looking for a desktop scraper (software that runs on your pc instead of a server) send me a PM.
     
    ilook, Dec 25, 2011 IP
  3. stats

    stats Well-Known Member

    Messages:
    586
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    110
    #3
    i don't think you'll get far with file_get_contents() with google. In most cases they simply block it

    Instead, what you can do, is use curl with disguised useragent (i've tried it, doesnt get blocked .. at least for few months)

    here you can find some code http://php.bigresource.com/curl-dis...s-by-fetching-the-information--51hRYqo2l.html

    to go even far, make an array with different useragent strings and put them into random rotation


    P.S.
    $url = preg_replace($pattern,"%20",$url); = BAD
    $url = urlencode($url); = GOOD


    P.P.S.
    What kind of a scrapper (fetcher) are you trying to make? if you have a good budget, i might be willing to write it for you.
     
    Last edited: Dec 25, 2011
    stats, Dec 25, 2011 IP
  4. Einheijar

    Einheijar Well-Known Member

    Messages:
    539
    Likes Received:
    13
    Best Answers:
    3
    Trophy Points:
    165
    #4
    
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, 'http://www.example.com'); 
    curl_setopt($ch, CURLOPT_HEADER, 1); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); 
    curl_setopt($ch, CURLOPT_PROXY, 'fakeproxy.com:1080'); 
    curl_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password'); 
    $data = curl_exec(); 
    curl_close($ch); 
    
    PHP:

    Proxied curl.
     
    Einheijar, Dec 25, 2011 IP