Google indexed pages checker

terryuk Notable Member

Messages:: 3,962

Likes Received:: 319

Best Answers:: 0

Trophy Points:: 255

#1

Hey,

I'm having troubles with abit of PHP code Im trying to use to return the amount of indexed pages a website has...

Heres the code;
<?php $data = implode('', file("http://www.google.com/search?q=site:www.$name"));
 preg_match_all("|Results [0-9]+ - [0-9]+ of [a-z ]*([0-9]*)|U",
 $data,
 $out, PREG_PATTERN_ORDER);
 $results = intval($out[1][0]);
 $nowww["google"] = $results; echo($nowww[google]);?>
Code (markup):
Anyone got ideas? It just keeps coming out as 0

terryuk, Dec 3, 2007 IP

hogan_h Peon

Messages:: 199

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#2

Try this:
<?php $data = implode('', file("http://www.google.com/search?q=site:www.$name"));
 preg_match_all("|Results [0-9]+ - [0-9]+ of [a-z ]*([0-9,]*)|U",
 $data,
 $out, PREG_PATTERN_ORDER);
 $results = intval(str_replace(",","",$out[1][0]));
 $nowww["google"] = $results;
 
 echo($nowww[google]);?>
Code (markup):
Explanation:
For particular pages i got few results, which contained "," inside, so you need to consider them too. Maybe it's localization issue, maybe you need to consider "." too, but adding "," into the pattern and then stripping after, worked for me.

hogan_h, Dec 3, 2007 IP

terryuk likes this.

terryuk Notable Member

Messages:: 3,962

Likes Received:: 319

Best Answers:: 0

Trophy Points:: 255

#3

Thanks for the reply, but it's just returning 0 for me :\

terryuk, Dec 3, 2007 IP

hogan_h likes this.

hogan_h Peon

Messages:: 199

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#4

What site are you trying?

When i use for example $name="cnn.com" with modified script version from above i get:
291000

hogan_h, Dec 3, 2007 IP

terryuk Notable Member

Messages:: 3,962

Likes Received:: 319

Best Answers:: 0

Trophy Points:: 255

#5

Well I just tried it with cnn.com too but comes up with 0

terryuk, Dec 3, 2007 IP

hogan_h Peon

Messages:: 199

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#6

Ok, then we need to debug it
Take this and let me know what results you get:
<?php 

	$name="spiegel.com";
	$data = implode('', file("http://www.google.com/search?q=site:www.$name"));
	
	echo "<pre>";
	print_r($data);
	echo "</pre>";
 echo "<hr />";
 preg_match_all("|Results [0-9]+ - [0-9]+ of [a-z]* ([0-9,]*)|U",
 $data,
 $out, PREG_PATTERN_ORDER);
 $results = intval(str_replace(",","",$out[1][0]));
 echo "<pre>";
	print_r($out);
	echo "</pre>";
 $nowww["google"] = $results; echo($nowww['google']);
?>
Code (markup):
I get results page, but the string of interest is this:
Results 1 - 10 of about 31,200 from www.spiegel.com. (0.03 seconds)
And then at the bottom i get:
Array
(
 [0] => Array
 (
 [0] => Results 1 - 10 of about 31,200
 )

 [1] => Array
 (
 [0] => 31,200
 )

)

31200
Code (markup):
What do you get with this same script ("string of interest" + Array content)?

hogan_h, Dec 3, 2007 IP

terryuk Notable Member

Messages:: 3,962

Likes Received:: 319

Best Answers:: 0

Trophy Points:: 255

#7

It seems like it was about to work, but I think Google may have a limit on so many remote queries a day? As I just got a 403 forbidden page.

terryuk, Dec 3, 2007 IP

hogan_h Peon

Messages:: 199

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#8

If you have another server then try it there...

Otherwise if you are testing locally and have dynamic IP, you could try reconnecting...

I find it rather strange, that you are getting 403 error...

hogan_h, Dec 3, 2007 IP

terryuk Notable Member

Messages:: 3,962

Likes Received:: 319

Best Answers:: 0

Trophy Points:: 255

#9

I'm getting this message;

We're sorry...

... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now.

terryuk, Dec 4, 2007 IP

hogan_h Peon

Messages:: 199

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#10

I'm sorry man... The code is correct and the same code works for me locally and yesterday i tested it on external server, it worked. If you had too many queries, then it's possible your server got "flagged". Maybe you could try using some of the proxy sites+cUrl combo...

hogan_h, Dec 4, 2007 IP

terryuk Notable Member

Messages:: 3,962

Likes Received:: 319

Best Answers:: 0

Trophy Points:: 255

#11

Thanks for your help hogan_h, just found a solution using Google API (http://www.useseo.com/google-api-demo.php)

terryuk, Dec 4, 2007 IP

hogan_h Peon

Messages:: 199

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#12

You are wellcome

Just for your information, if you are using google api casually, you will be fine, otherwise you should know that it has limited number of daily queries (1000). If that becomes a problem for you, you should take a look into Google Ajax API.

http://code.google.com/apis/soapsearch/api_faq.html#gen12

hogan_h, Dec 4, 2007 IP

ognos Peon

Messages:: 26

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#13

It's my code for this :



$fetch_url = "http://www.google.pl/search?hl=pl&q=site:".$site."&btnG=Szukaj&lr=";
ob_start();
include_once($fetch_url);
$page = ob_get_contents();
ob_end_clean();  

$page = str_replace(',','',$page);

preg_match_all('/<b>(\d+)/', $page, $wynik );


echo $wynik[0][2];

Code (markup):

ognos, Dec 4, 2007 IP

Log in or Sign up

Google indexed pages checker

terryuk Notable Member

hogan_h Peon

terryuk Notable Member

hogan_h Peon

terryuk Notable Member

hogan_h Peon

terryuk Notable Member

hogan_h Peon

terryuk Notable Member

hogan_h Peon

terryuk Notable Member

hogan_h Peon

ognos Peon

Useful Searches