PHP error with getAttribute on a non object

Discussion in 'PHP' started by tintumon, May 23, 2010.

  1. #1
    Hi,
    I am new to PHP but have javascript experience. I am trying to use cUrl to scrape a page(google.com in this example) and extract a link. However I am getting the errors below:

    Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in /home/tintus/public_html/PHP test.php on line 11

    Fatal error: Call to a member function getAttribute() on a non-object in /home/seriouss/public_html/PHP test.php on line 30

    the code I am using is below:

    In the final script I will be using a more specific xPath to get a specific url from a different site but I am using this for testing. I have googled it a bit but nothing came up that seemed relevent or that I could make much sense of. The echo I am also just using for testing purposes and I will later use this URL further on in the script.
     
    tintumon, May 23, 2010 IP
  2. sarahk

    sarahk iTamer Staff

    Messages:
    28,901
    Likes Received:
    4,555
    Best Answers:
    123
    Trophy Points:
    665
    #2
    Safemode is a pain, isn't it?

    I could only get some results if I commented out

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    PHP:
    However after looking at www.php.net I found some sample code that let me build a test page: http://www.itamer.com/muck/curltest...com/greasemonkey-catches-cookie-stuffers/594/

    It seems to handle the redirects that Google throws so its worth testing on your real examples
    <form method="GET">
    URL: <input type="text" name='url'>
    </form>
    <hr>
    <?php
    $target_url = "http://www.google.com/";
    $userAgent = 'Firefox (WindowsXP) – Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6';
    
    if (isset($_GET['url']))
    {
    	$target_url = $_GET['url'];	
    }
    
    
    function get_links($url) {
     
        // Create a new DOM Document to hold our webpage structure
        $xml = new DOMDocument();
    
     
        // Load the url's contents into the DOM
        @$xml->loadHTMLFile($url);
    
        // Empty array to hold all links to return
        $links = array();
     
        //Loop through each <a> tag in the dom and add it to the link array
        foreach($xml->getElementsByTagName('a') as $link) {
            $links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
        }
     
        //Return the links
        return $links;
    } 
    
    $links = get_links($target_url);
    echo "<ul>";
    foreach($links as $k => $v)
    {
    	$url = $v['url'];
    	$text = (empty($v['text']))?$url: $v['text'];
    	echo "<li><a href='{$url}'>{$text}</a></li>\n";
    }
    echo '</ul>';
    ?>
    PHP:
    In your code you have the @ when you load the dom. That was surpressing useful errors.
     
    sarahk, May 23, 2010 IP
  3. JDevereux

    JDevereux Peon

    Messages:
    50
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I usually scrape links this way:

    $dom = new DOMDocument();
    
      @$dom->loadHTML($html);
            
      $links = $dom -> getElementsByTagName('a');
    
      foreach ($links as $link) {
          echo $link ->getAttribute('href');          
        }
          
    PHP:
     
    JDevereux, May 23, 2010 IP
    sarahk likes this.