PHP help, please

Discussion in 'PHP' started by SearchBliss, Apr 7, 2010.

  1. #1
    I have developed a broken link checker that works great, unless the URLs don't have the base href in them.
    For example:
    If the links are ...href="http://www.somesite.com/somepage.html"... is works great
    But if they are ...href="somepage.html"..., ...href="/somepage.html"..., or...href="./somepage.html"... it ignors them

    Here's the problem code:
     $matches = array(); 
      preg_match_all("|href\=\"?'?`?([[:alnum:]:?=&@/;._-]+)\"?'?`?|i",  $html, $matches); 
      $links = array(); 
      $ret = $matches[1]; 
      for($i=0;isset($ret[$i]);$i++) { 
         if(preg_match("|^http://(.*)|i", $ret[ $i])) { 
             $links[] = $ret[$i]; 
         } elseif(preg_match("|^(.*)|i", $ret[$i])) { 
             $links[] = "http://".$info["host"]."". $ret[$i]; 
         } 
      } 
      return $links; 
    } 
    Code (markup):
    I thought
    } elseif(preg_match("|^(.*)|i", $ret[$i])) {
    $links[] = "http://".$info["host"]."". $ret[$i];
    would have taken care if it. Please help!
    Many Thanks.
     
    SearchBliss, Apr 7, 2010 IP
  2. Cloud Computing Forum

    Cloud Computing Forum Guest

    Messages:
    55
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    The problem here is that it's hard to debug regex just by looking at it, however can I suggest you try using standard functions where ever possible for example for checking if there is an http:// in the url you could use strpos.

    Here is a simple function which (if tested) would probably cover most urls but for urls that start with ./ they are probably within a subfolder below the root so how can you work out their full URL?

    
    function format_url($url, $domain_url){
        if( strpos($url, $domain_url) !== false ) {
    
               /* Remove first slash from url if present */
               if( substr(0,1,$url) == '/' )  {
                    $url = substr ( 1, count($url), $url);
               }
     
              /* Append domain URL to URL*/
              return $domain_url . $url;
        }
    }
    
    PHP:
    Dunno if this helps.
     
    Cloud Computing Forum, Apr 7, 2010 IP