screen scrapping numbers for a Google Search Result

Discussion in 'PHP' started by tonyrocks, Nov 23, 2009.

  1. #1
    I was wondering if anybody had any experience with using simplehtmldom php library for scrapping? I only want to grab numbers from a Google (or bing for that matter) of how many results there are for a query that I pass. For example:

    Go go Google and type happy people

    The results page displays:
    Results 1 - 10 of about 146,000,000 for happy people.


    All I want to do is grab the 146,000,000 and then output it to an array (for textfile, or CSV, etc...)

    Any suggestions?:confused:
     
    tonyrocks, Nov 23, 2009 IP
  2. szalinski

    szalinski Peon

    Messages:
    341
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #2
    you wouldn't need DOM for that, just download the page, preg_match the numerical string and then store it in your array, whatever that may be.
     
    szalinski, Nov 26, 2009 IP
  3. JAY6390

    JAY6390 Peon

    Messages:
    918
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    0
    #3
    szalinski is right. It's easier to just file_get_contents and then preg_match
     
    JAY6390, Nov 27, 2009 IP
  4. ankit_frenz

    ankit_frenz Active Member

    Messages:
    1,111
    Likes Received:
    41
    Best Answers:
    0
    Trophy Points:
    63
    #4
    instead its better you use curl to grab the page and than use regular expressions to get it via preg match
    Alternatively you can even take a screen shot of the result page and save for future reference with the Gd library
    Thanks
     
    ankit_frenz, Nov 27, 2009 IP
  5. web1001

    web1001 Peon

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    It's best not to grab search results from google.com directly. You can grab results from googleapi. It's faster and Google won't block your query. Here is a code I'm using in one of my websites. Just change $term and yourdomain.com (google asks for CURLOPT_REFERER in your curl query):

    $term="book";
    $url="http://ajax.googleapis.com/ajax/services/search/web?v=1.0&start=0&q=".urlencode($term);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_REFERER, "http://www.yourdomain.com");
    $body = curl_exec($ch);
    curl_close($ch);
    $temp=explode('estimatedResultCount":"', $body,2);
    $temp2=explode('"',$temp[1],2);
    echo $temp2[0];
     
    web1001, Nov 27, 2009 IP
  6. tonyrocks

    tonyrocks Active Member

    Messages:
    1,574
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    88
    #6
    That is a pretty slick piece of code. I have no idea why I made this so difficult. So, ok...I've just tried the code, but I get what appears to be an error code: 71300000 . I wonder if I even have cUrl installed!
     
    tonyrocks, Dec 1, 2009 IP
  7. web1001

    web1001 Peon

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    That is not an error. That is the number of search results. Try another $term and you'll get a different number.
     
    web1001, Dec 1, 2009 IP
  8. zoneweb

    zoneweb Peon

    Messages:
    41
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #8
    It's easy: Extract the text between "Results 1 - 10 of about " and " for happy people". You need to make a function for it. Use strlen(), strstr(), substr_replace(), strpos(), and substr() to accomplish the task.
     
    zoneweb, Dec 1, 2009 IP
  9. JAY6390

    JAY6390 Peon

    Messages:
    918
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    0
    #9
    or just a simple regex
    preg_match('%Results 1 - 10 of about ([^ ]+) for%i', $content, $matches);
    echo $matches[1];
    PHP:
    for example...
    $query = 'http://www.google.co.uk/search?q=jay+gilford';
    $content = strip_tags(file_get_contents($query));
    preg_match('%Results 1 - \d+ of about ([^ ]+) for%i', $content, $matches);
    echo $matches[1];
    PHP:
     
    JAY6390, Dec 1, 2009 IP
  10. tonyrocks

    tonyrocks Active Member

    Messages:
    1,574
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    88
    #10
    hehe...darn you regex! You solve all problems :) Thanks for all the info guys! I'm on my way...things are workin! woowho!
     
    tonyrocks, Dec 1, 2009 IP
  11. tonyrocks

    tonyrocks Active Member

    Messages:
    1,574
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    88
    #11
    Awesome...I'm having a blast now. But, I've run into a problem.

    My piece of code allows me to search for one word...but I need to search for a phrase, like "how to look awesome" in quotes or how to look awesome without quotes. If I plug in those words into $searchterm then I don't get any results.

    This is what I have:

    $searchterm='how to look awesome';
    $query = 'http://www.google.co.uk/search?q='.$searchterm;
    $content = strip_tags(file_get_contents($query));
    preg_match('%Results 1 - \d+ of about ([^ ]+) for%i', $content, $matches);
    echo $searchterm.' contains '.$matches[1].' matches.';
    PHP:
    Thanks!
     
    tonyrocks, Dec 7, 2009 IP
  12. web1001

    web1001 Peon

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    use urlencode:

    $query = 'http://www.google.co.uk/search?q='.urlencode($searchterm);
     
    web1001, Dec 7, 2009 IP
  13. metapix

    metapix Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    The following tutorial may come handy:

    http://www.codediesel.com/php/web-scraping-in-php-tutorial/
     
    metapix, Dec 7, 2009 IP