Submit articles - Mailboxes - Whoooop - Wordpress Themes - Manga

PDA

View Full Version : API results


jlawrence
May 19th 2005, 5:58 am
Perhaps this is quite the right forum to ask this, but I'll give it a go.
When I run a query through the API, I can access the number of results with $results->{estimatedTotalResultsCount}
This never ever seems to give me the correct numbers (or any where near them).
It seems to always be a large factor out - sometimes even as much as a factor of 10
example, a search for seo returns me an estimatedTotalResultsCount of 3,550,000 where as a normal search would show iro 18,200,000 results. That's a huge difference.
I also see differences when i run site: searches.
My figures never match what I see the the keyword tracker either.

Is there anything you need to do to get decent numbers out of the API ?

hmm, I've just noticed that I get very odd results for some site: searches.
where I am in the UK, api.google.com is resolving to 216.239.37.104

Is this the same around the world ??

mcdar
May 20th 2005, 5:29 am
I just "pinged" api.google.com from here in New York and it also resolved to 216.239.37.104

jlawrence
May 20th 2005, 5:51 am
ok, so assuming that you're using the api to get the indexed page count - in your tools.
That means there's something wrong with my code :(. For one domain I'm showing 19 pages, you're showing the more correct 300.
Or I'm misunderstanding what the api should be returning.
How do you get the number of indexed pages ?

jlawrence
May 21st 2005, 6:17 am
hmm, I'm getting very very confused here.
I've been using the estimatedTotalResultsCount returned by the API to give me an estimate of how many results would be returned.
For a link: search it seems about as accurate as a normal link: search - ie it gives me the same number.
for a site: search, the number returned is all over the shop.
For a normal saerch phrase, it's way out.

The odd thing is I seem to get few returns (if any) for some site: searches. Example if I use the API to search for site:www.plymouthcricketclub.com then it tells me there are 19 results, erm there should be 300+. Now, if I print all results returned, I find that result #20 is missing (everytime). #21 onwards is there. This happens to coincide with the estimatedtotal returned.
I'd be interested if I could send my script to someone (else where in the world) and see if you got exactly the same results.

J

jlawrence
May 21st 2005, 6:53 am
looks like the api will loop and continue returning the last 10 results if you ask it to go beyond the totalcount.
Damn it. I've now looked closer at the urls being returned.
The api server I'm talking to is only returning urls that existed over 6 months ago.
It's as if the api server has had it's data turned back 6 months.
Oh well. I suppose I'll have to give it a few days and see if it comes into sync with the api server that DP uses.
FFS this is annoying.

mcdar
May 23rd 2005, 6:08 am
API results are quirky.

Sometimes they seem to match those of the datacenters, and sometimes the results don't match any datacenter.

I haven't got a clue why this is I don't think it has anything to do with your code though.

Caryl

jlawrence
May 24th 2005, 5:32 am
I'm almost 100% convinced that it's got nothing to do with my code.
I get exactly the same results when using both my perl and php code.

Oddly enough, it only seems to be severly broken when I do a site: search. I've found another website in the uk with some api tools, and they're showing the same results as me.

What I need is to be able to point my code at a different api server. From the results I'm getting back for a site: search, it's as if it's only returning supplemental results. changing the filter settings seems to make zero difference.

Here's a trace of what's beig sent to the api server

<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/1999/XMLSchema" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body><namesp1:doGoogleSearch xmlns:namesp1="urn:GoogleSearch">
<key xsi:type="xsd:string">xxxxxxxxxx</key>
<q xsi:type="xsd:string">site:www.plymouthcricketclub.com</q>
<start xsi:type="xsd:int">0</start>
<maxResults xsi:type="xsd:int">10</maxResults>
<filter xsi:type="xsd:boolean">false</filter>
<restrict xsi:type="xsd:string">false</restrict>
<safeSearch xsi:type="xsd:boolean">false</safeSearch>
<lr xsi:type="xsd:string"/>
<ie xsi:type="xsd:string">latin1</ie>
<oe xsi:type="xsd:string">latin1</oe>
</namesp1:doGoogleSearch>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>


If any one can offer any suggestions, I'd be greatful.

greg_mcree
Oct 20th 2005, 2:50 am
Google datacenter does not know exactly how many results is availble so far if result list is too big, i.e. this result cannot be processed by human.

jlawrence
Oct 20th 2005, 3:55 am
I disagree, when you run a site: search, google knows exactly how many results it has in it's databases.
With a normal search, perhaps they don't know. But a site: isn't a normal search.
The API simply doesn't return results that are even anywhere near valid.

HN Will
Oct 22nd 2005, 5:27 am
Yeah - the computer can figure it out. But the question is why on earth can't G API return the same number of estimatedTotalResults?

I've noticed this problem as well and have just chalked it up to old data that's being fed to me....