API results

Discussion in 'Google API' started by jlawrence, May 19, 2005.

  1. #1
    Perhaps this is quite the right forum to ask this, but I'll give it a go.
    When I run a query through the API, I can access the number of results with $results->{estimatedTotalResultsCount}
    This never ever seems to give me the correct numbers (or any where near them).
    It seems to always be a large factor out - sometimes even as much as a factor of 10
    example, a search for seo returns me an estimatedTotalResultsCount of 3,550,000 where as a normal search would show iro 18,200,000 results. That's a huge difference.
    I also see differences when i run site: searches.
    My figures never match what I see the the keyword tracker either.

    Is there anything you need to do to get decent numbers out of the API ?

    hmm, I've just noticed that I get very odd results for some site: searches.
    where I am in the UK, api.google.com is resolving to 216.239.37.104

    Is this the same around the world ??
     
    jlawrence, May 19, 2005 IP
  2. mcdar

    mcdar Peon

    Messages:
    1,831
    Likes Received:
    110
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I just "pinged" api.google.com from here in New York and it also resolved to 216.239.37.104
     
    mcdar, May 20, 2005 IP
  3. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #3
    ok, so assuming that you're using the api to get the indexed page count - in your tools.
    That means there's something wrong with my code :(. For one domain I'm showing 19 pages, you're showing the more correct 300.
    Or I'm misunderstanding what the api should be returning.
    How do you get the number of indexed pages ?
     
    jlawrence, May 20, 2005 IP
  4. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #4
    hmm, I'm getting very very confused here.
    I've been using the estimatedTotalResultsCount returned by the API to give me an estimate of how many results would be returned.
    For a link: search it seems about as accurate as a normal link: search - ie it gives me the same number.
    for a site: search, the number returned is all over the shop.
    For a normal saerch phrase, it's way out.

    The odd thing is I seem to get few returns (if any) for some site: searches. Example if I use the API to search for site:www.plymouthcricketclub.com then it tells me there are 19 results, erm there should be 300+. Now, if I print all results returned, I find that result #20 is missing (everytime). #21 onwards is there. This happens to coincide with the estimatedtotal returned.
    I'd be interested if I could send my script to someone (else where in the world) and see if you got exactly the same results.

    J
     
    jlawrence, May 21, 2005 IP
  5. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #5
    looks like the api will loop and continue returning the last 10 results if you ask it to go beyond the totalcount.
    Damn it. I've now looked closer at the urls being returned.
    The api server I'm talking to is only returning urls that existed over 6 months ago.
    It's as if the api server has had it's data turned back 6 months.
    Oh well. I suppose I'll have to give it a few days and see if it comes into sync with the api server that DP uses.
    FFS this is annoying.
     
    jlawrence, May 21, 2005 IP
  6. mcdar

    mcdar Peon

    Messages:
    1,831
    Likes Received:
    110
    Best Answers:
    0
    Trophy Points:
    0
    #6
    API results are quirky.

    Sometimes they seem to match those of the datacenters, and sometimes the results don't match any datacenter.

    I haven't got a clue why this is I don't think it has anything to do with your code though.

    Caryl
     
    mcdar, May 23, 2005 IP
  7. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I'm almost 100% convinced that it's got nothing to do with my code.
    I get exactly the same results when using both my perl and php code.

    Oddly enough, it only seems to be severly broken when I do a site: search. I've found another website in the uk with some api tools, and they're showing the same results as me.

    What I need is to be able to point my code at a different api server. From the results I'm getting back for a site: search, it's as if it's only returning supplemental results. changing the filter settings seems to make zero difference.

    Here's a trace of what's beig sent to the api server
    
    <?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/1999/XMLSchema" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
    <SOAP-ENV:Body><namesp1:doGoogleSearch xmlns:namesp1="urn:GoogleSearch">
    <key xsi:type="xsd:string">xxxxxxxxxx</key>
    <q xsi:type="xsd:string">site:www.plymouthcricketclub.com</q>
    <start xsi:type="xsd:int">0</start>
    <maxResults xsi:type="xsd:int">10</maxResults>
    <filter xsi:type="xsd:boolean">false</filter>
    <restrict xsi:type="xsd:string">false</restrict>
    <safeSearch xsi:type="xsd:boolean">false</safeSearch>
    <lr xsi:type="xsd:string"/>
    <ie xsi:type="xsd:string">latin1</ie>
    <oe xsi:type="xsd:string">latin1</oe>
    </namesp1:doGoogleSearch>
    </SOAP-ENV:Body>
    </SOAP-ENV:Envelope>
    
    Code (markup):
    If any one can offer any suggestions, I'd be greatful.
     
    jlawrence, May 24, 2005 IP
  8. greg_mcree

    greg_mcree Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Google datacenter does not know exactly how many results is availble so far if result list is too big, i.e. this result cannot be processed by human.
     
    greg_mcree, Oct 20, 2005 IP
  9. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I disagree, when you run a site: search, google knows exactly how many results it has in it's databases.
    With a normal search, perhaps they don't know. But a site: isn't a normal search.
    The API simply doesn't return results that are even anywhere near valid.
     
    jlawrence, Oct 20, 2005 IP
  10. HN Will

    HN Will Guest

    Messages:
    111
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Yeah - the computer can figure it out. But the question is why on earth can't G API return the same number of estimatedTotalResults?

    I've noticed this problem as well and have just chalked it up to old data that's being fed to me....
     
    HN Will, Oct 22, 2005 IP