Hey folks, I have an array that contains fully qualified url's: domains["http:www.domain.com","http:www.domain.com/path/", "http:www.domain.com/foo","http:www.domain.com/boo", "http:www.someother.com","http:www.different.com","http:www.different.com/stuff", "http:www.boo.com","http:www.doo.com","http:www.goo.com",] Code (markup): I run it through a sort and would ideally like to kill all elements that are duplicates of domain. Been scratching my head a bit trying to parse_url and evaluate on the [host] but im a bit lost after a few hours.... Sounds simple enough but for somereason no matter how I iterate through it im coming up dry... any help would really be appreciated in returning an array that contains unique domains. Ideally the root level http://www.google.com would be kept while all other duplicates with paths are discarded...
Try this: $domains = array( "http:www.domain.com", "http:www.domain.com/path/", "http:www.domain.com/foo", "http:www.domain.com/boo", "http:www.someother.com", "http:www.different.com", "http:www.different.com/stuff", "http:www.boo.com", "http:www.doo.com", "http:www.goo.com" ); foreach ($domains as $key => $val) { list($domains[$key]) = split( "/" , $val, 2 ); } $domains = array_unique( $domains); sort($domains); foreach ($domains as $key => $val) { echo "domains[" . $key . "] = " . $val . "\n"; } Code (markup):
OOps forgot to mention that the array elemnts are actually in the format http:slashslashwww.domain.com/foo Code (markup): where "slash slash" is // but the board wont let me put them in lol
Maybe this might help explain my problem: domain[126] = http: //members.tripod.com/~Maija_Murphy domain[127] = http: //members.tripod.com/~Maija_Murphy/page2.html because I would only want to keep domain[126]
Obviously, these URLs are coming from somewhere. When they are collected, you should reduce the URL to the domain name only, taking into account the possibility of domain names which also point into an individual's directory. As you are doing this, populate your array, and then run it through the array_unique() and sort() functions.
This is what I ended up doing to get the result I wanted: $domain_array: ================================================================ Array ( [scheme] => http [host] => www.domain.com [path] => / ) Array ( [scheme] => http [host] => www.domain.com [path] => /fooo) Array ( [scheme] => http [host] => www.domain2.com [path] => /booo) Array ( [scheme] => http [host] => www.domain2.com [path] => / ) Array ( [scheme] => http [host] => www.domain3.com [path] => / ) Array ( [scheme] => http [host] => www.domain.com [path] => /yadda/yadda ) ================================================================= $parsed_array = array(); function uma($array, $sub_key) { $target = array(); $existing_sub_key_values = array(); foreach ($array as $key=>$sub_array) { if (!in_array($sub_array[$sub_key], $existing_sub_key_values)) { $existing_sub_key_values[] = $sub_array[$sub_key]; $target[$key] = $sub_array; } } return $target; } $domain_array = uma($domain_array, array_element_name); foreach ($domain_array as $key){ if (isset($key[host])) $uri = $key[host]; if (isset($key[port])) $uri .= ":".$key[port]; if (isset($key[path])) $uri .= $key[path]; if (isset($key[query])) $uri .= "?".$key[query]; if (isset($key[fragment])) $uri .= "#".$key[fragment]; $nurl = 'http://' . $uri; array_push($parsed_array, $nurl); } Code (markup): Spits out: http: //www.domain.com http: //www.domain2.com/booo http: //www.domain3.com This takes the output of parse_url on an array of urls to explode it into a multidimensional array of elements whith then I run uma() on to remove dupes based on element name. Then run it through a loop to reconstruct the orig array format I need. Granted it is a bit brute force but it gets the job done as I didnt want to rewrite 30+ other functions to support the raw output of uma(); Might help someone else.