Hey, I'm having troubles with abit of PHP code Im trying to use to return the amount of indexed pages a website has... Heres the code; <?php $data = implode('', file("http://www.google.com/search?q=site:www.$name")); preg_match_all("|Results <b>[0-9]+</b> - <b>[0-9]+</b> of [a-z ]*<b>([0-9]*)</b>|U", $data, $out, PREG_PATTERN_ORDER); $results = intval($out[1][0]); $nowww["google"] = $results; echo($nowww[google]);?> Code (markup): Anyone got ideas? It just keeps coming out as 0
Try this: <?php $data = implode('', file("http://www.google.com/search?q=site:www.$name")); preg_match_all("|Results <b>[0-9]+</b> - <b>[0-9]+</b> of [a-z ]*<b>([0-9,]*)</b>|U", $data, $out, PREG_PATTERN_ORDER); $results = intval(str_replace(",","",$out[1][0])); $nowww["google"] = $results; echo($nowww[google]);?> Code (markup): Explanation: For particular pages i got few results, which contained "," inside, so you need to consider them too. Maybe it's localization issue, maybe you need to consider "." too, but adding "," into the pattern and then stripping after, worked for me.
What site are you trying? When i use for example $name="cnn.com" with modified script version from above i get: 291000
Ok, then we need to debug it Take this and let me know what results you get: <?php $name="spiegel.com"; $data = implode('', file("http://www.google.com/search?q=site:www.$name")); echo "<pre>"; print_r($data); echo "</pre>"; echo "<hr />"; preg_match_all("|Results <b>[0-9]+</b> - <b>[0-9]+</b> of [a-z]* <b>([0-9,]*)</b>|U", $data, $out, PREG_PATTERN_ORDER); $results = intval(str_replace(",","",$out[1][0])); echo "<pre>"; print_r($out); echo "</pre>"; $nowww["google"] = $results; echo($nowww['google']); ?> Code (markup): I get results page, but the string of interest is this: Results 1 - 10 of about 31,200 from www.spiegel.com. (0.03 seconds) And then at the bottom i get: Array ( [0] => Array ( [0] => Results 1 - 10 of about 31,200 ) [1] => Array ( [0] => 31,200 ) ) 31200 Code (markup): What do you get with this same script ("string of interest" + Array content)?
It seems like it was about to work, but I think Google may have a limit on so many remote queries a day? As I just got a 403 forbidden page.
If you have another server then try it there... Otherwise if you are testing locally and have dynamic IP, you could try reconnecting... I find it rather strange, that you are getting 403 error...
I'm getting this message; We're sorry... ... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now.
I'm sorry man... The code is correct and the same code works for me locally and yesterday i tested it on external server, it worked. If you had too many queries, then it's possible your server got "flagged". Maybe you could try using some of the proxy sites+cUrl combo...
Thanks for your help hogan_h, just found a solution using Google API (http://www.useseo.com/google-api-demo.php)
You are wellcome Just for your information, if you are using google api casually, you will be fine, otherwise you should know that it has limited number of daily queries (1000). If that becomes a problem for you, you should take a look into Google Ajax API. http://code.google.com/apis/soapsearch/api_faq.html#gen12
It's my code for this : $fetch_url = "http://www.google.pl/search?hl=pl&q=site:".$site."&btnG=Szukaj&lr="; ob_start(); include_once($fetch_url); $page = ob_get_contents(); ob_end_clean(); $page = str_replace(',','',$page); preg_match_all('/<b>(\d+)/', $page, $wynik ); echo $wynik[0][2]; Code (markup):