Hi, I am new to PHP but have javascript experience. I am trying to use cUrl to scrape a page(google.com in this example) and extract a link. However I am getting the errors below: Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in /home/tintus/public_html/PHP test.php on line 11 Fatal error: Call to a member function getAttribute() on a non-object in /home/seriouss/public_html/PHP test.php on line 30 the code I am using is below: In the final script I will be using a more specific xPath to get a specific url from a different site but I am using this for testing. I have googled it a bit but nothing came up that seemed relevent or that I could make much sense of. The echo I am also just using for testing purposes and I will later use this URL further on in the script.
Safemode is a pain, isn't it? I could only get some results if I commented out curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); PHP: However after looking at www.php.net I found some sample code that let me build a test page: http://www.itamer.com/muck/curltest...com/greasemonkey-catches-cookie-stuffers/594/ It seems to handle the redirects that Google throws so its worth testing on your real examples <form method="GET"> URL: <input type="text" name='url'> </form> <hr> <?php $target_url = "http://www.google.com/"; $userAgent = 'Firefox (WindowsXP) – Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6'; if (isset($_GET['url'])) { $target_url = $_GET['url']; } function get_links($url) { // Create a new DOM Document to hold our webpage structure $xml = new DOMDocument(); // Load the url's contents into the DOM @$xml->loadHTMLFile($url); // Empty array to hold all links to return $links = array(); //Loop through each <a> tag in the dom and add it to the link array foreach($xml->getElementsByTagName('a') as $link) { $links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue); } //Return the links return $links; } $links = get_links($target_url); echo "<ul>"; foreach($links as $k => $v) { $url = $v['url']; $text = (empty($v['text']))?$url: $v['text']; echo "<li><a href='{$url}'>{$text}</a></li>\n"; } echo '</ul>'; ?> PHP: In your code you have the @ when you load the dom. That was surpressing useful errors.
I usually scrape links this way: $dom = new DOMDocument(); @$dom->loadHTML($html); $links = $dom -> getElementsByTagName('a'); foreach ($links as $link) { echo $link ->getAttribute('href'); } PHP: