Counting links

Discussion in 'PHP' started by caykoylu, Dec 10, 2008.

  1. #1
    Counting links

    Hello,

    I would like to count all:

    1. Internal Links

    and all:

    2. Outbound Links

    where

    $url="mypage.com/about.html";
    $pageHTML = file_get_contents('http://'.$url);

    So:

    Forexample;
    http://whois.domaintools.com/digitalpoint.com
    Links 7 (Internal: 7, Outbound: 0)

    <a href="contact.html">contact us</a> is an internal link.
    <a href="http://www.mypage.com/info.html">contact us</a> is an internal link.
    <a href="http://mypage.com/hello.html">hello world</a> is an internal link.
    <a href="https://secure.mypage.com/hello.html">hello world</a> is an internal link.
    <a href="http://www.google.com/">google</a> is an external link.
    <a href="https://secure.google.com/">google</a> is an external link.
    <a href="javascript:dosomething();">JS</a> is NOT a link.


    Any ideas?

    Thanks!
     
    caykoylu, Dec 10, 2008 IP
  2. rene7705

    rene7705 Peon

    Messages:
    233
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Read up on the preg_match_all function on php.net
     
    rene7705, Dec 10, 2008 IP
  3. seregaw

    seregaw Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Try somthing like this:
    
    <?php
    function parseLinks( $url ) {
    	$int = array();
    	$out = array();
    	$content = @file_get_contents( $url );
    	if( $content ) {
    		$urlParts = @parse_url( $link );
    		$curHost = preg_replace("/^ww.\./ims", '', $urlParts['host']);
    		preg_match_all("/<a.*?href\s*=\s*['\"]?([^\s>'\"]+)['\"]?/ims", $content, $matches, PREG_PATTERN_ORDER );
    		foreach ( $matches[1] as $link ) {
    			$link = trim( $link );
    			if( $link[0] == '#' ) {
    				continue;
    			}
    			$urlParts = @parse_url( $link );
    			if( @$urlParts['scheme'] && stripos( $urlParts['scheme'], 'http' ) === false ) {
    				continue;
    			}
    			if( $urlParts ) {
    				if ( @$urlParts['host'] && stripos( $urlParts['host'], $curHost ) === false ) {
    					$out[] = $link;
    				} else {
    					$int[] = $link;
    				}
    			}
    		}
    		return array( 'int' => $int, 'out' => $out );
    	}
    	return false;
    	
    }
    
    
    $res = parseLinks('http://forums.digitalpoint.com/showthread.php?p=10034063&posted=1#post10034063');
    
    echo "Internal: ".count( $res['int'] )."<br />";
    echo "Outbound: ".count( $res['out'] )."<br />";
    ?>
    
    PHP:
     
    seregaw, Dec 10, 2008 IP
  4. caykoylu

    caykoylu Peon

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    thank you its working..
    perfect solutions.
    but I want dont view link to google adsense advertisement links

    thank you..
     
    caykoylu, Dec 10, 2008 IP
  5. seregaw

    seregaw Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    caykoylu,


    Google adsense links are generated by Java Script.
    So, you can't see them in html source code.
     
    seregaw, Dec 10, 2008 IP