PHP script for detecting if a page is indexed by google

Discussion in 'PHP' started by ruvenf, May 2, 2011.

  1. #1
    Hi,
    Whats the best solution for writing a php script to detect is a page was indexed by google (not the number of indexed pages - for this I have a solution)
    also is there any suggestions for last crawl of a page php script?
    Thanks
     
    ruvenf, May 2, 2011 IP
  2. littlejohn199

    littlejohn199 Peon

    Messages:
    42
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Google API might allow you to do query to check if a url is indexed. Look into webmaster tool. However, google might have limit on how many queries you can run per a period of time.

    Or you can just simple type in "info:your_url" in the search box to check as described at "Check if a web page is indexed by google"

    Programmatically, you can write a simple file_get_contents() with the "info:" query to check if a web page is indexed by google by checking the result text being returned by google.
     
    littlejohn199, May 4, 2011 IP
  3. ruvenf

    ruvenf Peon

    Messages:
    19
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks littlejohn199,
    Is there any limits from google on opening a file using file_get_contents() with the "info:"
    like number of quires etc?
    also do you now of a way to find out the last crawl date of a page?
     
    ruvenf, May 4, 2011 IP
  4. careerfield

    careerfield Peon

    Messages:
    40
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    webmaster tools works generally well to find out number of pages crawled per day. If you just want to see if one page is indexed you can always use the google search term: site: www . yoursite . com to check
     
    careerfield, May 5, 2011 IP
  5. littlejohn199

    littlejohn199 Peon

    Messages:
    42
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #5
    @ruvenf

    I think google might have mechanism to detect excessive hits from a website/script to their website. Honestly, I don't really know exactly, but you can write a simple script to hit google website. Make sure you don't run this script on your real website, run it in your test environment.

    <?php

    for($i=0;$i<1000;$i++)
    {
    $content = file_get_contents("http://www.google.com");

    echo $content;

    //the actual url to get "info" for a url looks like this http://www.google.com/#sclient=psy&...=1&bav=on.2,or.r_gc.r_pw.&fp=822bf32b5c0e0691
    }

    ?>

    If google doesn't block your script, then there is good chance you can query google site directly without using their API.

    Hope this helps

    Little John
     
    littlejohn199, May 5, 2011 IP