I'm not editing my page and leaving it as is. http://dnfinder.net/rentacoder/test...alibaba.com/archives/company/0/companies.html Let it run, and after 10 seconds and you will see the bad request error at the very bottom where it should be continuing? I've been looking into what it might be, and people say cookies, but I have no idea where the cookies could be coming from. Here is the code to the script, I appreciate any help and will pay for your time! I need this for a job. Hopefully it's an easy fix! <?php //Functions--------------------------------------------------------------------------------------------------------------------------------------- //Utility Functions ############################################################################################################################## function set_timeout_time($time) { if($time<>0) { set_time_limit($time); } else { //ToDo: Put the time limit back to normal [800] set_time_limit(70);//so your server doesn't burn and crash and die, but will still go a while. ignore_user_abort(true);//so the user can't disconnect and stop the script, whole reason I'm re-scripting is because of this. } echo '<br>Time Limit:'.ini_get('max_execution_time'); } function getResultsFromURLArray($urls) { $mh = curl_multi_init(); foreach ($urls as $i => $url) { $conn[$i]=curl_init($url); $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; if($_GET['javascript']==1) { $header[] = "content-type: application/x-javascript"; } $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: en-us,en;q=0.5"; $header[] = "Pragma: "; // browsers keep this blank. curl_setopt($conn[$i], CURLOPT_URL, $url); curl_setopt($conn[$i], CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)'); curl_setopt($conn[$i], CURLOPT_HTTPHEADER, $header); curl_setopt($conn[$i], CURLOPT_REFERER, 'http://www.google.com'); curl_setopt($conn[$i], CURLOPT_ENCODING, 'gzip,deflate'); curl_setopt($conn[$i], CURLOPT_AUTOREFERER, true); curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1); curl_setopt($conn[$i], CURLOPT_TIMEOUT, 20); curl_multi_add_handle ($mh,$conn[$i]); } do { $n=curl_multi_exec($mh,$active); } while ($active); foreach ($urls as $i => $url) { $res[$i]=curl_multi_getcontent($conn[$i]); curl_multi_remove_handle($mh,$conn[$i]); curl_close($conn[$i]); } curl_multi_close($mh); return($res);//returns the array of result data, we will analyse this next } //END Utility Functions ########################################################################################################################## //Program Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ function getURLsFromSectionMatch($result_array,$Host,$Schema,$message,$sectionRegEx) { $debug = $_GET['debug']; echo '<b>'.$message.'</b>'; foreach($result_array as $data) { //There needs to be a killCheck here~ preg_match($sectionRegEx,$data,$sectionMatch); //print_r($result_array); if($sectionMatch) { echo '<br>Got Section...'; $section = $sectionMatch[1]; preg_match_all('/<a href="(.+?)"/si',$section,$linkMatch,PREG_SET_ORDER); foreach($linkMatch as $val) { if($Schema<>'' and $Host<>'') { $url = $Schema."://".$Host.$val[1]; } else { $url =$val[1]; } if($debug==1) { echo '<br>url ='.$url; } $arrayOfLinks[]=$url;//append to the array the link } } else { die("<br><b>Error, grabbing Section Match in the data:<br>$data"); } } return $arrayOfLinks; } //End Program Functions //Program Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //Phase Functions @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ function phase1() { } function phase2() { } function phase3() { $first_url[0]=$_GET['url'];//makes it a 1 index array~ $result_array = getResultsFromURLArray($first_url);//in this case array is only 1 item $parts = parse_url($first_url[0]);//must not be a relative link for the initial one~ $Host = $parts['host']; $Schema = $parts['scheme']; $category_links = getURLsFromSectionMatch($result_array,'','',"<br>Getting Category Links...",'/<h2>Company List - Company List<\/h2>(.+?)<\/div>/si'); $category_results = getResultsFromURLArray($category_links);//fills the result array $company_list_links = getURLsFromSectionMatch($category_results,$Host,$Schema,"<br>Getting Category List Links...",'/<div id="scontent">.+?<\/h2>(.+?)<\/div>/si'); } //End Phase Functions @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ //End of Functions ------------------------------------------------------------------------------------------------------------------------------- //Main Code--------------------------------------------------------------------------------------------------------------------------------------- header('Content-Type: text/html; charset=utf-8');//so it prints all characters set_timeout_time($_Get['time']);//to prevent time outs switch ($_GET['phase']) { case 1: echo "<br>Script Running for Phase 1"; phase1(); break; case 2: echo "<br>Script Running for Phase 2"; phase2(); break; case 3: echo "<br>Script Running for Phase 3"; phase3(); break; default: echo "<br>Unrecognizable Phase Number"; } //End Main Code------------------------------------------------------------------------------------------------------------------------------------ echo '<br><b>End of Script</b>'; ?> PHP:
It simply means numbers of requests are exceeding and as the number of requests exceeds the site will send this. This is the alibaba site which is not allowing u to do this. Nothing wrong in program Regards Alex
That is incorrect, you may need to adjust your code. $header isn't being unset, therefore sending too many headers as each requests ADDS MORE HEADERS. Setting $header[0] will not clean the $header variable. Adjust your code: From: curl_setopt($conn[$i], CURLOPT_HTTPHEADER, $header); PHP: To: curl_setopt($conn[$i], CURLOPT_HTTPHEADER, $header); unset($header); PHP: Hope this helps, Jay
You deserve money jayshah, now it works lol, I'm dumb, can't believe I didn't see that. I kept thinking it was what I was doing client side not what the server was doing as it made requests, let me know if you want it for helping, I really appreciate it.