Hey, How I can pregmatch domain base in this example "http://www.caasco.com/automotive/"; I need to extract caasco.com in this example
Hello, I don't know for sure if this will work with php. Here is regex string in c#: (?:"http://www.)(?.*)(?:.com/automotive/") Hope it helps. Do a few tests until you get it right. The idea is that ?: will make it so the text won't be added in the final mat ch.
parse_url() and str_replace('www.', '', $parse['host']) is much faster than using preg_match. But there is no solution working with all tlds without using a complexe whitelist. f.e. you aren't able to find the domain: For the most of all cases, I'm using this: function getdomain($url) { // add scheme if (strpos($url, '://') === false) { $url = 'http://' . $url; } // filter host $p = parse_url($url); $host = $p['host']; if ($host{0} . $host{1} . $host{2} . $host{3} == 'www.') { $host = substr($host, 4); } // filter domain $p = explode('.', $host); $cp = count($p); return ($p[$cp-1] == 'uk' || $p[$cp-2] == 'com' || $p[$cp-2] == 'co' || $p[$cp-1] == 'pro') ? ($p[$cp-3] . '.' . $p[$cp-2] . '.' . $p[$cp-1]) : ($p[$cp-2] . '.' . $p[$cp-1]); } Code (markup): If you only want the domain of your host, you should go better with that: $domain = strtolower(str_replace(array('www.', 'ww.', ':80'), '', $_SERVER['SERVER_NAME'])); $domain = $domain{strlen($domain)-1} != '.' ? $domain : substr($domain, 0, -1); Code (markup):
Thank you very much for all your help, but I still need your help. Like I am creating script which counting number of external links on the page, 1) I am parsing page 2) Getting all links on page 3) the problem I hit is I can not figure out how to find the difference between urls like: bigger.html bigger.com first I were thinking to pregmatch to domain extension like ".com", but then I realize that so many domains extension that its too crazy, what you can suggest ?