Google SERP and number of indexed pages - SCRIPT HELP NEEDED

Discussion in 'PHP' started by goliathus, Aug 9, 2008.

  1. #1
    Hi,

    please do you know how to get number of indexed pages of your domain in google?

    For example if I want number of indexed pages of www.google.com, I open this page http://www.google.com/search?hl=en&q=site:www.google.com and I will parse "3,660,000"

    parsing is simple, the problem is if I call the page with file_get_contents, google blocks it. The function is working fine with other sites.

    maybe I should use any API? If yes, which API exactly? I tried to use Google Custom Search API, but it doesn't offer the number :(

    Thanks for your tips.
     
    goliathus, Aug 9, 2008 IP
  2. ltdraper

    ltdraper Peon

    Messages:
    30
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Use curl instead of straight file_get_contents. They're figuring out that you're not a browser. For example:

    $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // browsers keep this blank.

    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    @curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_REFERER, 'http://yousitehere.com');

    if ( isset($_SERVER['HTTP_USER_AGENT']) )
    {
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    }
    else
    {
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
    }

    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

    $output = curl_exec($ch);

    Where $url = your query

    I can't remember where I got this code, it was a curl example and it seems to work fine.
     
    ltdraper, Aug 9, 2008 IP
  3. goliathus

    goliathus Peon

    Messages:
    93
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks a lot, works fine!
     
    goliathus, Aug 11, 2008 IP