Can someone show me how to use preg_match_all() to parse the URLs from the following Google Search: http://www.google.com/ie?q=keyword&hl=en&btnG=Search http://www.google.com/search?q=keyword Thanks!
If you can give me a better idea of what you're wanting, I'd be glad to help you out. It just seems like you want to grab the titles or links or something?
<? function grab_links_from_google( $keyword, $maxpages = null ) { // You should prolly start from 0 $start = 0 ; // You haven't done anything yet $parsed = 0 ; // arrray_merge cries otherwise $return = array( ) ; // Nothing to say on this line do // Start a loop to grab endless pages from google, or not { // Match links away from google, one page at a time preg_match_all( "~<a title=\".*?\" href=(.*?)>.*?</a>~", file_get_contents( sprintf( 'http://www.google.com/ie?q=%s&hl=en&btnG=Search&start=%d', $keyword, $start ) ), $pages ); // Start page for next loop $start += 10 ; // Keep the data we wanted $return = array_merge( $return, $pages[1] ); // Increment the counter for pages parsed $parsed++; } // End do { } // Conditions for looping while( count( $pages[0] ) > 2 and $maxpages != $parsed ); // Return unique data return array_unique( $return ) ; } foreach( grab_links_from_google( $_GET['keyword'] ? $_GET['keyword'] : 'krakjoe', # Setup keyword $_GET['max'] ? $_GET['max'] : null # Do the same for the maximum amount of pages to get ) as $num => $link ) { printf("Link Number %d : %s<br />\n", $num + 1, $link ); } PHP: Dunno about excluding paid ads, I got bored, sorry .... script.php?keyword=mykeyword script.php?keyword=mykeyword&max=4 would be examples of how to use it, not specifying a max will get all pages with links away from google in .... you might wanna play with the regex and array_merge part depending on how you plan to use it .... You know google have an api for making your own search site, that might be far better suited to your needs, but that's how you do as you asked .....