How would you all recommend that a site exists? Should I get_headers() and then search through the headers for "200" for the status code? Well I want to allow 301, or 302 also. What should I do?
get_headers() should be easier to use. function url_exists($url) { $headers = get_headers($url); return preg_match('/^HTTP\/\d\.\d\s+(200|301|302)/', $headers[0]); } PHP:
nico is right if you have get headers, then maybe you should use that, although nicos function doesn't return the response code, just true or false. <? function httpCode( $url ) { if( preg_match("/^https?:\\/\\//si", $url ) ) $url = preg_replace("/^https?:\\/\\//si", "", $url ); $sock = fsockopen( $url, 80, $errno, $errstr ); if( !$sock ) return false; $out = "GET / HTTP/1.1\r\n"; $out .= "Host: $url\r\n"; $out .= "Connection: Close\r\n\r\n"; fwrite( $sock, $out ); while( !feof($sock) ) if( preg_match("/HTTP\\/1.[0-9] ([0-9]+)/si", fgets( $sock, 128 ), $http )) return $http[1]; fclose($sock); } printf("Server exited with response code %d", httpCode( "digitalpoint.com" ) ); PHP: Thats how I would have done it before php5
Good point about not knowing if OP wants to check existence vs. status, so I figured I'd throw my own curl-based code into the mix (see bottom of post). It's very well commented, easy to use. Run $header_array = check_url('SOME_URL',true); Code (markup): To get all the headers for a site, INCLUDING redirects. So, for example, if you use 'http://google.com' as the URL in the call, you'll get back TWO headers: the first a '301 moved' and then a '200 ok'. If you call $header_array = check_url('SOME_URL',false); Code (markup): You get back only ONE header, because the false flag tells us not to follow redirects. save the code below into a file and run it -- the output data is very clear. <? function parse_HTTP_header($HTTPheader) { /* First, see how *many* headers we have. For example, in the case of a 3xx redirect, we'll have TWO headers: a 3xx redirect, and then the header (hopefully a 200) we got redirected to. */ $headers = preg_split('/\r\n\r\n/', $HTTPheader,-1,PREG_SPLIT_NO_EMPTY); /* ok, now we have an array containing a number of headers. For each one, create a map containing the header information. The first line of a header contains: <server version> <response code> <description> \r\n Every other line contains: <atribute> : <value> \r\n So we need to split out each long header line into separate lines (which are terminated by \r\n) and then split each of THOSE lines into constituent data or attribute/value pairs as described above. */ $header_blocks = array(); foreach($headers as $header) { $lines = preg_split('/\r\n/',$header,-1,PREG_SPLIT_NO_EMPTY); /* now we have each line of the header in an array called $lines So now we split *that* up as follows: The *first* entry in the array contains response code data in this form: <server version> <response code> <description> (e.g. HTTP/1.1 200 OK) Every subsequent entry contains attribute/value pairs in this form: <attribute> : <value> (Content-Type : text/html) So we need to treat the first entry of the array differently from the rest. */ $data = array(); $response = preg_split('/\s/',$lines[0],3,PREG_SPLIT_NO_EMPTY); $data['header_server'] = $response[0]; $data['header_code'] = $response[1]; $data['header_description'] = $response[2]; // now do each attribute/value pair for ($i=1; $i < count($lines); $i++) { list($attr,$value) = preg_split('/:/',$lines[$i],2,PREG_SPLIT_NO_EMPTY); $data[$attr] = trim($value); } // now attach all the data to the master array of headers we return to the caller $header_blocks[] = $data; } return $header_blocks; } /* The following */ function check_url($url,$redirect=true) { $ch = curl_init($url); curl_setopt($ch, CURLOPT_NOBODY, 1); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $redirect === true ? 1 : 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $result=curl_exec ($ch); curl_close ($ch); return parse_HTTP_header($result); } /*------------------------------------------------------------------------------------ TEST CALL Rather than explain what comes back in the $headers array, just run the code below and look at the output onscreen. It's all quite clear. ------------------------------------------------------------------------------------*/ echo '<pre>'; $headers = check_url('http://google.com',true); print_r($headers); echo '</pre>'; ?> PHP:
function url_exists($url) { $result = parse_url($url); if(isset($result['host'])) { $cmdResult = gethostbynamel($result['host']); if($cmdResult) { return true; } else { return false; } } else { return false; } }