Best way to verify site exists?

Discussion in 'PHP' started by bobby9101, Mar 25, 2007.

  1. #1
    How would you all recommend that a site exists?
    Should I get_headers() and then search through the headers for "200" for the status code? Well I want to allow 301, or 302 also.
    What should I do?
     
    bobby9101, Mar 25, 2007 IP
  2. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #2
    fsockopen on port 80, read the headers sent back to make sure your code isnt being tricked.
     
    krakjoe, Mar 25, 2007 IP
  3. bobby9101

    bobby9101 Peon

    Messages:
    3,292
    Likes Received:
    134
    Best Answers:
    0
    Trophy Points:
    0
    #3
    thanks, will check it out
     
    bobby9101, Mar 25, 2007 IP
  4. bobby9101

    bobby9101 Peon

    Messages:
    3,292
    Likes Received:
    134
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Hmm, it doesn't seem to make much sense to me. How to I read the headers?
     
    bobby9101, Mar 25, 2007 IP
  5. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #5
    get_headers() should be easier to use.

    
    function url_exists($url)
    {
    	$headers = get_headers($url);
    	return preg_match('/^HTTP\/\d\.\d\s+(200|301|302)/', $headers[0]);
    }
    
    
    PHP:
     
    nico_swd, Mar 25, 2007 IP
  6. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #6
    nico is right if you have get headers, then maybe you should use that, although nicos function doesn't return the response code, just true or false.

    
    <?
    function httpCode( $url )
    {
    	if( preg_match("/^https?:\\/\\//si", $url ) )
    		$url = preg_replace("/^https?:\\/\\//si", "", $url );
    		
    	$sock = fsockopen( $url, 80, $errno, $errstr );
    	
    	if( !$sock ) 
    		return false;
    	
    	$out = "GET / HTTP/1.1\r\n";
        $out .= "Host: $url\r\n";
       	$out .= "Connection: Close\r\n\r\n";
    	
    	fwrite( $sock, $out );
       	
    	while( !feof($sock) )
    		if( preg_match("/HTTP\\/1.[0-9] ([0-9]+)/si", fgets( $sock, 128 ), $http ))
    			return $http[1];
    	
    	fclose($sock);
    }
    
    printf("Server exited with response code %d", httpCode( "digitalpoint.com" ) );
    
    PHP:
    Thats how I would have done it before php5
     
    krakjoe, Mar 25, 2007 IP
    bobby9101 likes this.
  7. bobby9101

    bobby9101 Peon

    Messages:
    3,292
    Likes Received:
    134
    Best Answers:
    0
    Trophy Points:
    0
    #7
    thanks both, I will rep if I can
     
    bobby9101, Mar 25, 2007 IP
  8. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #8
    I thought he just wanted to verify if the site exists, and not the actual status.
     
    nico_swd, Mar 25, 2007 IP
  9. sea otter

    sea otter Peon

    Messages:
    250
    Likes Received:
    23
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Good point about not knowing if OP wants to check existence vs. status, so I figured I'd throw my own curl-based code into the mix (see bottom of post).

    It's very well commented, easy to use. Run
    
    $header_array = check_url('SOME_URL',true);
    
    Code (markup):
    To get all the headers for a site, INCLUDING redirects. So, for example, if you use 'http://google.com' as the URL in the call, you'll get back TWO headers: the first a '301 moved' and then a '200 ok'.

    If you call
    
    $header_array = check_url('SOME_URL',false);
    
    Code (markup):
    You get back only ONE header, because the false flag tells us not to follow redirects.

    save the code below into a file and run it -- the output data is very clear.

    
    <?
    	function parse_HTTP_header($HTTPheader)
    	{
    		/* 
    			First, see how *many* headers we have.  For example, in the case of
    		 	a 3xx redirect, we'll have TWO headers: a 3xx redirect, and then
    		 	the header (hopefully a 200) we got redirected to.
    		*/
    		$headers = preg_split('/\r\n\r\n/', $HTTPheader,-1,PREG_SPLIT_NO_EMPTY); 
    		
    		/*
    			ok, now we have an array containing a number of headers.  For each one,
    			create a map containing the header information.
    			
    			The first line of a header contains: 
    					<server version> <response code> <description> \r\n
    			
    			Every other line contains:
    			 		<atribute> : <value> \r\n
    		
    			So we need to split out each long header line into separate lines (which are terminated by \r\n)
    			and then split each of THOSE lines into constituent data or attribute/value pairs
    			as described above.
    		*/
    
    		$header_blocks = array();
    		
    		foreach($headers as $header) {
    			$lines = preg_split('/\r\n/',$header,-1,PREG_SPLIT_NO_EMPTY);
    
    			/* 
    				now we have each line of the header in an array called $lines
    			 	So now we split *that* up as follows:
    			 	
    				The *first* entry in the array contains response code data in this form:
    					<server version> <response code> <description> (e.g. HTTP/1.1 200 OK)
    				
    				Every subsequent entry contains attribute/value pairs in this form:
    					<attribute> : <value> (Content-Type : text/html)
    			 		
    				So we need to treat the first entry of the array differently from the rest.
    			*/
    			$data = array();
    
    			$response = preg_split('/\s/',$lines[0],3,PREG_SPLIT_NO_EMPTY);
    			$data['header_server'] = $response[0];
    			$data['header_code'] = $response[1];
    			$data['header_description'] = $response[2];
    			
    			// now do each attribute/value pair
    			for ($i=1; $i < count($lines); $i++) {
    				list($attr,$value) = preg_split('/:/',$lines[$i],2,PREG_SPLIT_NO_EMPTY);
    				$data[$attr] = trim($value);
    			}
    
    			// now attach all the data to the master array of headers we return to the caller
    			$header_blocks[] = $data; 
    		}
    		
    		return $header_blocks;
    }
    
    /*
    	The following 
    */
    
    function check_url($url,$redirect=true)
    {
    	$ch = curl_init($url);
    	curl_setopt($ch, CURLOPT_NOBODY, 1);
    	curl_setopt($ch, CURLOPT_HEADER, 1);
    	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $redirect === true ? 1 : 0);
    	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    	$result=curl_exec ($ch);
    	curl_close ($ch);
    	return parse_HTTP_header($result);
    }
    
    /*------------------------------------------------------------------------------------
    									TEST CALL
    Rather than explain what comes back in the $headers array, just run the code below
    and look at the output onscreen.  It's all quite clear.
    ------------------------------------------------------------------------------------*/
    
    	echo '<pre>';
    	$headers = check_url('http://google.com',true);
    	print_r($headers);
    	echo '</pre>';
    ?>
    
    PHP:
     
    sea otter, Mar 25, 2007 IP
  10. bobby9101

    bobby9101 Peon

    Messages:
    3,292
    Likes Received:
    134
    Best Answers:
    0
    Trophy Points:
    0
    #10
    You were right, I didn't need the status was just using that to make sure it exists lol
     
    bobby9101, Mar 25, 2007 IP
  11. jitesh

    jitesh Peon

    Messages:
    81
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #11
    function url_exists($url)
    {
    $result = parse_url($url);
    if(isset($result['host'])) {
    $cmdResult = gethostbynamel($result['host']);

    if($cmdResult) {
    return true;
    } else {
    return false;
    }
    } else {
    return false;
    }
    }
     
    jitesh, Mar 25, 2007 IP
  12. krakjoe

    krakjoe Well-Known Member

    Messages:
    1,795
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    135
    #12
    oops, sucker for not reading properly lol....
     
    krakjoe, Mar 26, 2007 IP