preg_match_all help

Discussion in 'PHP' started by Web Directory, May 25, 2007.

  1. #1
    Can someone show me how to use preg_match_all() to parse the URLs from the following Google Search:

    http://www.google.com/ie?q=keyword&hl=en&btnG=Search
    http://www.google.com/search?q=keyword

    Thanks!
     
    Web Directory, May 25, 2007 IP
  2. projectshifter

    projectshifter Peon

    Messages:
    394
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #2
    If you can give me a better idea of what you're wanting, I'd be glad to help you out. It just seems like you want to grab the titles or links or something?
     
    projectshifter, May 26, 2007 IP
  3. Web Directory

    Web Directory Peon

    Messages:
    358
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I want to get all the URLs from the search result.
     
    Web Directory, May 26, 2007 IP
  4. projectshifter

    projectshifter Peon

    Messages:
    394
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Just the URLs, or the URLs and the link titles?
     
    projectshifter, May 26, 2007 IP
  5. Web Directory

    Web Directory Peon

    Messages:
    358
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Just the URLs from the search result excluding paid ads.
     
    Web Directory, May 26, 2007 IP
  6. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #6
    
    <?
    function grab_links_from_google( $keyword, $maxpages = null )
    {
     	// You should prolly start from 0
     	$start = 0 ;
     	// You haven't done anything yet
     	$parsed = 0 ;
     	// arrray_merge cries otherwise
    	$return = array( ) ;
     	// Nothing to say on this line
    	do // Start a loop to grab endless pages from google, or not
    	{  // Match links away from google, one page at a time
    		preg_match_all(	"~<a title=\".*?\" href=(.*?)>.*?</a>~", 
    						file_get_contents( sprintf( 'http://www.google.com/ie?q=%s&hl=en&btnG=Search&start=%d', $keyword, $start ) ), 
    						$pages 
    		);
    		// Start page for next loop
    		$start += 10 ;
    		// Keep the data we wanted
    	 	$return = array_merge( $return, $pages[1] );
    	 	// Increment the counter for pages parsed
    	 	$parsed++;
    	} // End do { } 
    	// Conditions for looping
    	while( count( $pages[0] ) > 2 and $maxpages != $parsed );
    	// Return unique data
    	return array_unique( $return ) ;
    }
    foreach( grab_links_from_google( $_GET['keyword'] ? $_GET['keyword'] : 'krakjoe', # Setup keyword
    								 $_GET['max'] ? $_GET['max'] : null # Do the same for the maximum amount of pages to get
    		 ) as $num => $link )
    {
    	printf("Link Number %d : %s<br />\n", $num + 1, $link );
    }
    
    PHP:
    Dunno about excluding paid ads, I got bored, sorry ....

    script.php?keyword=mykeyword
    script.php?keyword=mykeyword&max=4

    would be examples of how to use it, not specifying a max will get all pages with links away from google in .... you might wanna play with the regex and array_merge part depending on how you plan to use it ....

    You know google have an api for making your own search site, that might be far better suited to your needs, but that's how you do as you asked .....
     
    krakjoe, May 26, 2007 IP
  7. Web Directory

    Web Directory Peon

    Messages:
    358
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Thanks very much!
     
    Web Directory, May 27, 2007 IP