First of all, thanks for looking =). Here is a function I wrote. function getResultFromURL($URL) { $ch = curl_init($URL); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_HEADER, 0);//no need to get header information curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); $result=curl_exec($ch); return $result;//returns a CURL resource, with the source of the page or an ERROR } PHP: This works on any other page given a URL and returns the site correctly. However for the site http://www.alibaba.com it doesn't work! It brings me to this cpanel thing instead. How can I make CURL correct go the real site that is supposed to be there? I'm highly confused as I thought CURLOPT_FOLLOWLOCATION was supposed to do this for me. [I'm not in safe mode, and I've tried the MaxRedirect Option already, no luck =( ] Willing to pay for assistance . Best, - Joe~
Here is my script in action and what it puts. http://dnfinder.net/rentacoder/alibabaScrape.php?url=http://www.alibaba.com
Hey good news I figured out the answer on my own >.<, I don't get why this works, but perhaps that site has security, so I changed the headers to make it say something different. function getResultFromURL($url) { $curl = curl_init(); $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: en-us,en;q=0.5"; $header[] = "Pragma: "; // browsers keep this blank. curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)'); curl_setopt($curl, CURLOPT_HTTPHEADER, $header); curl_setopt($curl, CURLOPT_REFERER, 'http://www.google.com'); curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate'); curl_setopt($curl, CURLOPT_AUTOREFERER, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_TIMEOUT, 10); $html = curl_exec($curl); // execute the curl command curl_close($curl); // close the connection return $html; // and finally, return $html } PHP: Found this at php.net and it worked! =), I'm not so sure why, but I guess there is something sneky going on with that site. -joe