How to pregmatch word in string

Discussion in 'PHP' started by deriklogov, Oct 15, 2009.

  1. #1
    Hey, How I can pregmatch domain base in this example
    "http://www.caasco.com/automotive/";


    I need to extract caasco.com in this example
     
    deriklogov, Oct 15, 2009 IP
  2. kbluhm

    kbluhm Peon

    Messages:
    23
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
  3. sodevrom

    sodevrom Member

    Messages:
    51
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #3
    Hello,
    I don't know for sure if this will work with php. Here is regex string in c#:

    (?:"http://www.)(?.*)(?:.com/automotive/")

    Hope it helps. Do a few tests until you get it right.
    The idea is that ?: will make it so the text won't be added in the final mat ch.
     
    sodevrom, Oct 16, 2009 IP
  4. mgutt

    mgutt Active Member

    Messages:
    28
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    90
    #4
    parse_url() and str_replace('www.', '', $parse['host']) is much faster than using preg_match.

    But there is no solution working with all tlds without using a complexe whitelist. f.e. you aren't able to find the domain:
    For the most of all cases, I'm using this:
    function getdomain($url) {
    	// add scheme
    	if (strpos($url, '://') === false) {
    		$url = 'http://' . $url;
    	}
    	// filter host
    	$p = parse_url($url);
    	$host = $p['host'];
    	if ($host{0} . $host{1} . $host{2} . $host{3} == 'www.') {
    		$host = substr($host, 4);
    	}
    	// filter domain
    	$p = explode('.', $host);
    	$cp = count($p);
    	return ($p[$cp-1] == 'uk' || $p[$cp-2] == 'com' || $p[$cp-2] == 'co' || $p[$cp-1] == 'pro') ? ($p[$cp-3] . '.' . $p[$cp-2] . '.' . $p[$cp-1]) : ($p[$cp-2] . '.' . $p[$cp-1]);
    }
    Code (markup):
    If you only want the domain of your host, you should go better with that:
    $domain = strtolower(str_replace(array('www.', 'ww.', ':80'), '', $_SERVER['SERVER_NAME']));
    $domain = $domain{strlen($domain)-1} != '.' ? $domain : substr($domain, 0, -1);
    Code (markup):
     
    mgutt, Oct 16, 2009 IP
  5. deriklogov

    deriklogov Well-Known Member

    Messages:
    1,080
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    130
    #5
    Thank you very much for all your help, but I still need your help.

    Like I am creating script which counting number of external links on the page,
    1) I am parsing page
    2) Getting all links on page
    3) the problem I hit is I can not figure out how to find the difference between urls like:
    bigger.html
    bigger.com

    first I were thinking to pregmatch to domain extension like ".com", but then I realize that so many domains extension that its too crazy,
    what you can suggest ?
     
    deriklogov, Oct 16, 2009 IP