1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Google indexed pages checker

Discussion in 'PHP' started by terryuk, Dec 3, 2007.

  1. #1
    Hey,

    I'm having troubles with abit of PHP code Im trying to use to return the amount of indexed pages a website has...

    Heres the code;

    
    <?php  $data = implode('', file("http://www.google.com/search?q=site:www.$name"));
            preg_match_all("|Results <b>[0-9]+</b> - <b>[0-9]+</b> of [a-z ]*<b>([0-9]*)</b>|U",
                $data,
                $out, PREG_PATTERN_ORDER);
            $results = intval($out[1][0]);
            $nowww["google"] = $results;  echo($nowww[google]);?>
    
    Code (markup):
    Anyone got ideas? It just keeps coming out as 0 :eek:
     
    terryuk, Dec 3, 2007 IP
  2. hogan_h

    hogan_h Peon

    Messages:
    199
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Try this:
    
    <?php  $data = implode('', file("http://www.google.com/search?q=site:www.$name"));
            preg_match_all("|Results <b>[0-9]+</b> - <b>[0-9]+</b> of [a-z ]*<b>([0-9,]*)</b>|U",
                $data,
                $out, PREG_PATTERN_ORDER);
            $results = intval(str_replace(",","",$out[1][0]));
            $nowww["google"] = $results;
            
            echo($nowww[google]);?>
    
    Code (markup):
    Explanation:
    For particular pages i got few results, which contained "," inside, so you need to consider them too. Maybe it's localization issue, maybe you need to consider "." too, but adding "," into the pattern and then stripping after, worked for me.
     
    hogan_h, Dec 3, 2007 IP
    terryuk likes this.
  3. terryuk

    terryuk Notable Member

    Messages:
    3,962
    Likes Received:
    319
    Best Answers:
    0
    Trophy Points:
    255
    #3
    Thanks for the reply, but it's just returning 0 for me :\
     
    terryuk, Dec 3, 2007 IP
    hogan_h likes this.
  4. hogan_h

    hogan_h Peon

    Messages:
    199
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #4
    What site are you trying?

    When i use for example $name="cnn.com" with modified script version from above i get:
    291000
     
    hogan_h, Dec 3, 2007 IP
  5. terryuk

    terryuk Notable Member

    Messages:
    3,962
    Likes Received:
    319
    Best Answers:
    0
    Trophy Points:
    255
    #5
    Well I just tried it with cnn.com too but comes up with 0
     
    terryuk, Dec 3, 2007 IP
  6. hogan_h

    hogan_h Peon

    Messages:
    199
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Ok, then we need to debug it :)
    Take this and let me know what results you get:
    
    <?php  
    
    	$name="spiegel.com";
    	$data = implode('', file("http://www.google.com/search?q=site:www.$name"));
    	
    	echo "<pre>";
    	print_r($data);
    	echo "</pre>";
        echo "<hr />";
        preg_match_all("|Results <b>[0-9]+</b> - <b>[0-9]+</b> of [a-z]* <b>([0-9,]*)</b>|U",
        $data,
        $out, PREG_PATTERN_ORDER);
        $results = intval(str_replace(",","",$out[1][0]));
        echo "<pre>";
    	print_r($out);
    	echo "</pre>";
        $nowww["google"] = $results;  echo($nowww['google']);
    ?>
    
    Code (markup):
    I get results page, but the string of interest is this:
    Results 1 - 10 of about 31,200 from www.spiegel.com. (0.03 seconds)
    And then at the bottom i get:
    
    Array
    (
        [0] => Array
            (
                [0] => Results 1 - 10 of about 31,200
            )
    
        [1] => Array
            (
                [0] => 31,200
            )
    
    )
    
    31200
    
    Code (markup):
    What do you get with this same script ("string of interest" + Array content)?
     
    hogan_h, Dec 3, 2007 IP
  7. terryuk

    terryuk Notable Member

    Messages:
    3,962
    Likes Received:
    319
    Best Answers:
    0
    Trophy Points:
    255
    #7
    It seems like it was about to work, but I think Google may have a limit on so many remote queries a day? As I just got a 403 forbidden page.
     
    terryuk, Dec 3, 2007 IP
  8. hogan_h

    hogan_h Peon

    Messages:
    199
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #8
    If you have another server then try it there...

    Otherwise if you are testing locally and have dynamic IP, you could try reconnecting...

    I find it rather strange, that you are getting 403 error...
     
    hogan_h, Dec 3, 2007 IP
  9. terryuk

    terryuk Notable Member

    Messages:
    3,962
    Likes Received:
    319
    Best Answers:
    0
    Trophy Points:
    255
    #9
    I'm getting this message;

    We're sorry...

    ... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now.
     
    terryuk, Dec 4, 2007 IP
  10. hogan_h

    hogan_h Peon

    Messages:
    199
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I'm sorry man... The code is correct and the same code works for me locally and yesterday i tested it on external server, it worked. If you had too many queries, then it's possible your server got "flagged". Maybe you could try using some of the proxy sites+cUrl combo...
     
    hogan_h, Dec 4, 2007 IP
  11. terryuk

    terryuk Notable Member

    Messages:
    3,962
    Likes Received:
    319
    Best Answers:
    0
    Trophy Points:
    255
    #11
    terryuk, Dec 4, 2007 IP
  12. hogan_h

    hogan_h Peon

    Messages:
    199
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #12
    You are wellcome ;)

    Just for your information, if you are using google api casually, you will be fine, otherwise you should know that it has limited number of daily queries (1000). If that becomes a problem for you, you should take a look into Google Ajax API.

    http://code.google.com/apis/soapsearch/api_faq.html#gen12
     
    hogan_h, Dec 4, 2007 IP
  13. ognos

    ognos Peon

    Messages:
    26
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #13
    It's my code for this :

    
    
    $fetch_url = "http://www.google.pl/search?hl=pl&q=site:".$site."&btnG=Szukaj&lr=";
    ob_start();
    include_once($fetch_url);
    $page = ob_get_contents();
    ob_end_clean();  
    
    $page = str_replace(',','',$page);
    
    preg_match_all('/<b>(\d+)/', $page, $wynik );
    
    
    echo $wynik[0][2];
    
    
    Code (markup):
     
    ognos, Dec 4, 2007 IP