Help! $10, HTTP BAD Request 400 The number of request header fields exceeds ....

Discussion in 'PHP' started by x11joex11, Jan 18, 2008.

  1. #1
    I'm not editing my page and leaving it as is.

    http://dnfinder.net/rentacoder/test...alibaba.com/archives/company/0/companies.html

    Let it run, and after 10 seconds and you will see the bad request error at the very bottom where it should be continuing? I've been looking into what it might be, and people say cookies, but I have no idea where the cookies could be coming from.

    Here is the code to the script, I appreciate any help and will pay for your time! I need this for a job. Hopefully it's an easy fix!
    
    <?php
    
    //Functions---------------------------------------------------------------------------------------------------------------------------------------
    
    //Utility Functions ##############################################################################################################################
    function set_timeout_time($time)
    {
    	if($time<>0)
    	{
    		set_time_limit($time);
    	}
    	else
    	{
    		//ToDo: Put the time limit back to normal [800]
    		set_time_limit(70);//so your server doesn't burn and crash and die, but will still go a while.
    		ignore_user_abort(true);//so the user can't disconnect and stop the script, whole reason I'm re-scripting is because of this.
    	}
    	echo '<br>Time Limit:'.ini_get('max_execution_time');
    }
    
    function getResultsFromURLArray($urls)
    {
    	$mh = curl_multi_init();
    	
    	foreach ($urls as $i => $url) 
    	{
    		$conn[$i]=curl_init($url);
    		   
    		$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
    		$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    		if($_GET['javascript']==1)
    		{
    			$header[] = "content-type: application/x-javascript";
    		}
    		$header[] = "Cache-Control: max-age=0";
    		$header[] = "Connection: keep-alive";
    		$header[] = "Keep-Alive: 300";
    		$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    		$header[] = "Accept-Language: en-us,en;q=0.5";
    		$header[] = "Pragma: "; // browsers keep this blank.
    		
    		curl_setopt($conn[$i], CURLOPT_URL, $url);
    		curl_setopt($conn[$i], CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
    		curl_setopt($conn[$i], CURLOPT_HTTPHEADER, $header);
    		curl_setopt($conn[$i], CURLOPT_REFERER, 'http://www.google.com');
    		curl_setopt($conn[$i], CURLOPT_ENCODING, 'gzip,deflate');
    		curl_setopt($conn[$i], CURLOPT_AUTOREFERER, true);
    		curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);
    		curl_setopt($conn[$i], CURLOPT_TIMEOUT, 20);
    		
    		curl_multi_add_handle ($mh,$conn[$i]);
    	}
    	
    	do { $n=curl_multi_exec($mh,$active); } while ($active);
    	
    	foreach ($urls as $i => $url) {
    		   $res[$i]=curl_multi_getcontent($conn[$i]);
    		   curl_multi_remove_handle($mh,$conn[$i]);
    		   curl_close($conn[$i]);
    	}
    	curl_multi_close($mh);
    	return($res);//returns the array of result data, we will analyse this next
    }
    //END Utility Functions ##########################################################################################################################
    //Program Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    function getURLsFromSectionMatch($result_array,$Host,$Schema,$message,$sectionRegEx)
    {
    	$debug = $_GET['debug'];
    	
    	echo '<b>'.$message.'</b>';
    	
    	foreach($result_array as $data)
    	{
    		//There needs to be a killCheck here~
    		preg_match($sectionRegEx,$data,$sectionMatch);
    		//print_r($result_array);
    		if($sectionMatch)
    		{
    			echo '<br>Got Section...';
    			$section = $sectionMatch[1];
    			preg_match_all('/<a href="(.+?)"/si',$section,$linkMatch,PREG_SET_ORDER);
    			foreach($linkMatch as $val)
    			{
    				if($Schema<>'' and $Host<>'')
    				{
    					$url = $Schema."://".$Host.$val[1];
    				}
    				else
    				{
    					$url =$val[1];
    				}
    				if($debug==1)
    				{
    					echo '<br>url ='.$url;
    				}
    				$arrayOfLinks[]=$url;//append to the array the link
    			}
    		}
    		else
    		{
    			die("<br><b>Error, grabbing Section Match in the data:<br>$data");
    		}
    	}
    	
    	return $arrayOfLinks;
    }
    
    //End Program Functions //Program Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //Phase Functions @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    function phase1()
    {
    
    }
    
    function phase2()
    {
    
    }
    
    function phase3()
    {
    	$first_url[0]=$_GET['url'];//makes it a 1 index array~
    	$result_array = getResultsFromURLArray($first_url);//in this case array is only 1 item
    	
    	$parts = parse_url($first_url[0]);//must not be a relative link for the initial one~
    	$Host = $parts['host'];
    	$Schema = $parts['scheme'];
    	
    	$category_links = getURLsFromSectionMatch($result_array,'','',"<br>Getting Category Links...",'/<h2>Company List - Company List<\/h2>(.+?)<\/div>/si');
    	$category_results = getResultsFromURLArray($category_links);//fills the result array
    	$company_list_links = getURLsFromSectionMatch($category_results,$Host,$Schema,"<br>Getting Category List Links...",'/<div id="scontent">.+?<\/h2>(.+?)<\/div>/si');
    }
    
    //End Phase Functions @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    //End of Functions -------------------------------------------------------------------------------------------------------------------------------
    //Main Code---------------------------------------------------------------------------------------------------------------------------------------
    header('Content-Type: text/html; charset=utf-8');//so it prints all characters
    set_timeout_time($_Get['time']);//to prevent time outs
    
    switch ($_GET['phase'])
    {
    	case 1:
    		echo "<br>Script Running for Phase 1";
    		phase1();
    		break;
    	case 2:
    		echo "<br>Script Running for Phase 2";
    		phase2();
    		break;
    	case 3:
    		echo "<br>Script Running for Phase 3";
    		phase3();
    		break;
    	default:
    		echo "<br>Unrecognizable Phase Number";
    }
    //End Main Code------------------------------------------------------------------------------------------------------------------------------------
    echo '<br><b>End of Script</b>';
    ?>
    
    PHP:
     
    x11joex11, Jan 18, 2008 IP
  2. kmap

    kmap Well-Known Member

    Messages:
    2,215
    Likes Received:
    29
    Best Answers:
    2
    Trophy Points:
    135
    #2
    It simply means numbers of requests are exceeding and as the number of requests exceeds the site will send this.

    This is the alibaba site which is not allowing u to do this.

    Nothing wrong in program

    Regards

    Alex
     
    kmap, Jan 18, 2008 IP
  3. jayshah

    jayshah Peon

    Messages:
    1,126
    Likes Received:
    68
    Best Answers:
    1
    Trophy Points:
    0
    #3
    That is incorrect, you may need to adjust your code.

    $header isn't being unset, therefore sending too many headers as each requests ADDS MORE HEADERS. Setting $header[0] will not clean the $header variable. Adjust your code:

    From:
    
            curl_setopt($conn[$i], CURLOPT_HTTPHEADER, $header);
    
    PHP:
    To:
    
            curl_setopt($conn[$i], CURLOPT_HTTPHEADER, $header);
            unset($header);
    
    PHP:
    Hope this helps,

    Jay
     
    jayshah, Jan 18, 2008 IP
  4. x11joex11

    x11joex11 Peon

    Messages:
    106
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    You deserve money jayshah, now it works lol, I'm dumb, can't believe I didn't see that. I kept thinking it was what I was doing client side not what the server was doing as it made requests, let me know if you want it for helping, I really appreciate it.
     
    x11joex11, Jan 18, 2008 IP
  5. jayshah

    jayshah Peon

    Messages:
    1,126
    Likes Received:
    68
    Best Answers:
    1
    Trophy Points:
    0
    #5
    True to your word. Thanks!

    Jay
     
    jayshah, Jan 18, 2008 IP
  6. me4you

    me4you Well-Known Member

    Messages:
    1,989
    Likes Received:
    37
    Best Answers:
    0
    Trophy Points:
    155
    #6
    the problem is solved? or still looking someone !
     
    me4you, Jan 18, 2008 IP
  7. jayshah

    jayshah Peon

    Messages:
    1,126
    Likes Received:
    68
    Best Answers:
    1
    Trophy Points:
    0
    #7
    The problem is solved. Sorry if I took your place :D
     
    jayshah, Jan 18, 2008 IP