Remove Duplicate String From Array?

Discussion in 'PHP' started by mrkryz, Aug 30, 2006.

  1. #1
    Hey folks,

    I have an array that contains fully qualified url's:
    
    domains["http:www.domain.com","http:www.domain.com/path/",
    "http:www.domain.com/foo","http:www.domain.com/boo", 
    "http:www.someother.com","http:www.different.com","http:www.different.com/stuff",
    "http:www.boo.com","http:www.doo.com","http:www.goo.com",]
    
    Code (markup):
    I run it through a sort and would ideally like to kill all elements that are duplicates of domain. Been scratching my head a bit trying to parse_url and evaluate on the [host] but im a bit lost after a few hours.... Sounds simple enough but for somereason no matter how I iterate through it im coming up dry... any help would really be appreciated in returning an array that contains unique domains. Ideally the root level http://www.google.com would be kept while all other duplicates with paths are discarded...
     
    mrkryz, Aug 30, 2006 IP
  2. clancey

    clancey Peon

    Messages:
    1,099
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Try this:

    
    $domains = array( "http:www.domain.com", "http:www.domain.com/path/",
     "http:www.domain.com/foo", "http:www.domain.com/boo",
     "http:www.someother.com", "http:www.different.com",
     "http:www.different.com/stuff", "http:www.boo.com",
     "http:www.doo.com", "http:www.goo.com" );
    
    foreach ($domains as $key => $val)
       { list($domains[$key]) = split( "/" , $val, 2 ); }
    $domains = array_unique( $domains);
    sort($domains);
    foreach ($domains as $key => $val)
       { echo "domains[" . $key . "] = " . $val . "\n"; }
    
    Code (markup):
     
    clancey, Aug 30, 2006 IP
  3. mrkryz

    mrkryz Peon

    Messages:
    6
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Returns:

    domain[0] = http: domain[1] = https:
     
    mrkryz, Aug 30, 2006 IP
  4. mrkryz

    mrkryz Peon

    Messages:
    6
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    OOps forgot to mention that the array elemnts are actually in the format
    http:slashslashwww.domain.com/foo
    Code (markup):
    where "slash slash" is // but the board wont let me put them in lol
     
    mrkryz, Aug 30, 2006 IP
  5. mrkryz

    mrkryz Peon

    Messages:
    6
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Maybe this might help explain my problem:

    domain[126] = http: //members.tripod.com/~Maija_Murphy
    domain[127] = http: //members.tripod.com/~Maija_Murphy/page2.html

    because I would only want to keep domain[126]
     
    mrkryz, Aug 30, 2006 IP
  6. clancey

    clancey Peon

    Messages:
    1,099
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Obviously, these URLs are coming from somewhere. When they are collected, you should reduce the URL to the domain name only, taking into account the possibility of domain names which also point into an individual's directory. As you are doing this, populate your array, and then run it through the array_unique() and sort() functions.
     
    clancey, Aug 30, 2006 IP
  7. mrkryz

    mrkryz Peon

    Messages:
    6
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    This is what I ended up doing to get the result I wanted:

    
    
    $domain_array:
    ================================================================
    Array ( [scheme] => http [host] => www.domain.com [path] => / )
    Array ( [scheme] => http [host] => www.domain.com [path] => /fooo)
    Array ( [scheme] => http [host] => www.domain2.com [path] => /booo)
    Array ( [scheme] => http [host] => www.domain2.com [path] => / )
    Array ( [scheme] => http [host] => www.domain3.com [path] => / )
    Array ( [scheme] => http [host] => www.domain.com [path] => /yadda/yadda )
    =================================================================
    
    
    
    $parsed_array = array();
    
    function uma($array, $sub_key) {
       $target = array();
       $existing_sub_key_values = array();
       foreach ($array as $key=>$sub_array) {
           if (!in_array($sub_array[$sub_key], $existing_sub_key_values)) {
               $existing_sub_key_values[] = $sub_array[$sub_key];
               $target[$key] = $sub_array;
           }
       }
       return $target;
    }
    $domain_array = uma($domain_array, array_element_name);
    
    foreach ($domain_array as $key){
            if (isset($key[host]))    $uri = $key[host];
            if (isset($key[port]))    $uri .= ":".$key[port];
            if (isset($key[path]))    $uri .= $key[path];
            if (isset($key[query]))    $uri .= "?".$key[query];
            if (isset($key[fragment])) $uri .= "#".$key[fragment];
    
            $nurl = 'http://' . $uri;
            array_push($parsed_array, $nurl);
    }
    
    
    Code (markup):
    Spits out:

    http: //www.domain.com
    http: //www.domain2.com/booo
    http: //www.domain3.com


    This takes the output of parse_url on an array of urls to explode it into a multidimensional array of elements whith then I run uma() on to remove dupes based on element name. Then run it through a loop to reconstruct the orig array format I need. Granted it is a bit brute force but it gets the job done as I didnt want to rewrite 30+ other functions to support the raw output of uma(); Might help someone else.
     
    mrkryz, Aug 30, 2006 IP