I was wondering if anybody had any experience with using simplehtmldom php library for scrapping? I only want to grab numbers from a Google (or bing for that matter) of how many results there are for a query that I pass. For example: Go go Google and type happy people The results page displays: Results 1 - 10 of about 146,000,000 for happy people. All I want to do is grab the 146,000,000 and then output it to an array (for textfile, or CSV, etc...) Any suggestions?
you wouldn't need DOM for that, just download the page, preg_match the numerical string and then store it in your array, whatever that may be.
instead its better you use curl to grab the page and than use regular expressions to get it via preg match Alternatively you can even take a screen shot of the result page and save for future reference with the Gd library Thanks
It's best not to grab search results from google.com directly. You can grab results from googleapi. It's faster and Google won't block your query. Here is a code I'm using in one of my websites. Just change $term and yourdomain.com (google asks for CURLOPT_REFERER in your curl query): $term="book"; $url="http://ajax.googleapis.com/ajax/services/search/web?v=1.0&start=0&q=".urlencode($term); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_REFERER, "http://www.yourdomain.com"); $body = curl_exec($ch); curl_close($ch); $temp=explode('estimatedResultCount":"', $body,2); $temp2=explode('"',$temp[1],2); echo $temp2[0];
That is a pretty slick piece of code. I have no idea why I made this so difficult. So, ok...I've just tried the code, but I get what appears to be an error code: 71300000 . I wonder if I even have cUrl installed!
That is not an error. That is the number of search results. Try another $term and you'll get a different number.
It's easy: Extract the text between "Results 1 - 10 of about " and " for happy people". You need to make a function for it. Use strlen(), strstr(), substr_replace(), strpos(), and substr() to accomplish the task.
or just a simple regex preg_match('%Results 1 - 10 of about ([^ ]+) for%i', $content, $matches); echo $matches[1]; PHP: for example... $query = 'http://www.google.co.uk/search?q=jay+gilford'; $content = strip_tags(file_get_contents($query)); preg_match('%Results 1 - \d+ of about ([^ ]+) for%i', $content, $matches); echo $matches[1]; PHP:
hehe...darn you regex! You solve all problems Thanks for all the info guys! I'm on my way...things are workin! woowho!
Awesome...I'm having a blast now. But, I've run into a problem. My piece of code allows me to search for one word...but I need to search for a phrase, like "how to look awesome" in quotes or how to look awesome without quotes. If I plug in those words into $searchterm then I don't get any results. This is what I have: $searchterm='how to look awesome'; $query = 'http://www.google.co.uk/search?q='.$searchterm; $content = strip_tags(file_get_contents($query)); preg_match('%Results 1 - \d+ of about ([^ ]+) for%i', $content, $matches); echo $searchterm.' contains '.$matches[1].' matches.'; PHP: Thanks!