Help me find a better way to do this function.

Discussion in 'PHP' started by exodus, Dec 4, 2008.

  1. #1
    I have the below function. It basically takes text and removes all the most common words and non wanted words and then makes what is left over a clickable keyword for that piece of media. Looking for a faster way to do this per media entry or cleaner and better way to do what I am looking to do. Remove the words I don't want and create a list of clickable keywords.


    
      function clean($str)
      {
         $str = strip_tags($str);
         $str = ereg_replace("[^a-zA-Z ]", " ", $str);
         $str = eregi_replace(" +", " ", $str);
         $str = strtolower($str);
         return $str;  
      }
      
      function getkeywords($kw_list, $count)
      {
          global $SITEURL;
          
          if (trim($kw_list[0]) == '') { return false; }
          
          $kw_list = array_map("clean", $kw_list);
          
          //remove any single not needed words.
          $string = "me,very,any,cc,their,thier,plz,ld,ok,okay,wouldn,since,soon,the,of,and,a,to,in,is,you,that,it,he,was,for,on,are,as,with,his,they,i,at,be,this,have,from,or,one,had,by,word,but,not,what,all,were,we,when,your,can,said,there,use,an,each,which,she,do,how,their,if,will,up,other,about,out,many,then,them,these,so,some,her,would,make,like,him,into,time,has,look,two,more,write,go,see,number,no,way,could,people,my,than,first,water,been,call,who,oil,its,now,find,long,down,day,did,get,come,made,may,part,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,0,1,2,3,4,5,6,7,8,9";
          $remove = explode(',', $string);
          
          $f = array();
          foreach ($kw_list as $word)
          {
              if (!in_array("$word", $remove) && !in_array("$word", $f)) { $f[] = $word; }
          }
          
          
          if (trim($f[0]) == '') { return false; }
          
          shuffle($f);
          if (count($f) <= ($count - 1)) { $count = count($f) - 1; }
          for ($i = 0; $i < ($count - 1); $i++)
          {
            $kw_return .= '<a href="' . $SITEURL . '/index.php?search_query=' . trim($f[$i]) . '" title="Find media with ' . $f[$i] . ' in them.">' . trim(ucfirst($f[$i])) . '</a>, ';
          }
          $kw_return .= '<a href="' . $SITEURL . '/index.php?search_query=' . trim($f[$count]) . '" title="Find media with ' . $f[$count] . ' in them.">' . trim(ucfirst($f[$count])) . '</a>';
          
          return $kw_return;
      }
    
    PHP:
     
    exodus, Dec 4, 2008 IP
  2. xrvel

    xrvel Notable Member

    Messages:
    918
    Likes Received:
    30
    Best Answers:
    2
    Trophy Points:
    225
    #2
    Here is a new one. i replaced ereg with preg and some modifications. Not sure if this is faster.
    
    <?php
    
    $junk = 'money sweet sugar car            , script, me';
    
    echo getkeywords(explode(' ', $junk), 100);
    
    function clean($str) {
    	$str = strip_tags($str);
    	$str = preg_replace('/([^a-z ]+)/i', ' ', $str);
    	$str = preg_replace('/([ ]+)/', ' ', $str);
    	$str = strtolower($str);
    	return $str;  
    }
    
    function getkeywords($kw_list, $count) {
    	global $SITEURL;
    
    	if (trim($kw_list[0]) == '') {
    		return false;
    	}
    
    	$kw_list = array_map('clean', $kw_list);
    
    	// mod, just use the same variable
    	$remove = 'me,very,any,cc,their,thier,plz,ld,ok,okay,wouldn,since,soon,the,of,and,a,to,in,is,you,that,it,he,was,for,on,are,as,with,his,they,i,at,be,this,have,from,or,one,had,by,word,but,not,what,all,were,we,when,your,can,said,there,use,an,each,which,she,do,how,their,if,will,up,other,about,out,many,then,them,these,so,some,her,would,make,like,him,into,time,has,look,two,more,write,go,see,number,no,way,could,people,my,than,first,water,been,call,who,oil,its,now,find,long,down,day,did,get,come,made,may,part,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,0,1,2,3,4,5,6,7,8,9';
    	$remove = explode(',', $remove);
    
    	$f = array();
    	foreach ($kw_list as $word) {
    		$word = trim($word);
    		if (!in_array($word, $remove) && !in_array($word, $f) && $word != '') {
    			$f[] = $word;
    		}
    	}
    	unset($remove);// mod
    
    	if ($f == array()) {// mod
    		return false;
    	}
    
    	shuffle($f);
    
    	while (isset($f[$count])) {// while have more than $count
    		array_pop($f);// remove them
    	}
    
    	$kw_return = '';
    	foreach ($f as $f2) {// i modified for into foreach
    		$kw_return .= '<a href="' . $SITEURL . '/index.php?search_query=' . $f2 . '" title="Find media with ' . $f2 . ' in them.">' . ucfirst($f2) . '</a>, ';
    	}
    	unset($f);
    
    	return rtrim($kw_return, ', ');// mod
    }
    ?>
    
    PHP:
     
    xrvel, Dec 4, 2008 IP
    exodus likes this.
  3. Barti1987

    Barti1987 Well-Known Member

    Messages:
    2,703
    Likes Received:
    115
    Best Answers:
    0
    Trophy Points:
    185
    #3
    If you have a list of the keywords that you know are clickable, then it would save lots of time. More like the way inline ads work.

    Imagine you have an article of great length, then your algorithm will take too much time.

    Peace,
     
    Barti1987, Dec 4, 2008 IP
  4. exodus

    exodus Well-Known Member

    Messages:
    1,900
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    165
    #4
    I don't have that list. I am going for every other word in a short under 200 word description would be a hot word (key word).

    ---

    Thank you xrvel.

     
    exodus, Dec 4, 2008 IP
  5. pharmboy

    pharmboy Member

    Messages:
    30
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #5
    I'll be honest, I didn't read through your code thoroughly, but it seems to me you're using arrays too much here, where hashtables would be more efficient. Wherever you're checking in_array, it could be checking in a hashtable instead, which would be faster.
     
    pharmboy, Dec 4, 2008 IP
  6. wmtips

    wmtips Well-Known Member

    Messages:
    601
    Likes Received:
    70
    Best Answers:
    1
    Trophy Points:
    150
    #6
    Yeah, searching big arrays using in_array is more expensive than querying them by the hash. Also to improve performance (if function called several times), it may be reasonable to save ignore words array into the global var. So you can improve code speed this way:

    
    ...
        //ignore words cache
        if (!isset($GLOBALS['ignorewords']))
        {
         $rem = 'me,very,any,cc,their,thier,plz,ld,ok,okay,wouldn,since,soon,the,of,and,a,to,in,is,you,that,it,he,was,for,on,are,as,with,his,they,i,at,be,this,have,from,or,one,had,by,word,but,not,what,all,were,we,when,your,can,said,there,use,an,each,which,she,do,how,their,if,will,up,other,about,out,many,then,them,these,so,some,her,would,make,like,him,into,time,has,look,two,more,write,go,see,number,no,way,could,people,my,than,first,water,been,call,who,oil,its,now,find,long,down,day,did,get,come,made,may,part,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,0,1,2,3,4,5,6,7,8,9';
         $rem = explode(',', $rem);
    
         //convert values into keys
         $remove = array();
         foreach ($rem as $v)
          $remove[$v] = 1;
    
         $GLOBALS['ignorewords'] = $remove;
        }
        else
         $remove = $GLOBALS['ignorewords'];
    
        $f = array();
        foreach ($kw_list as $word) {
            $word = trim($word);
            if (!isset($remove[$word]) ....
    
    PHP:
     
    wmtips, Dec 4, 2008 IP
  7. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #7
    I don't believe preg VS ereg will make a difference assuming the $kw_list argument is an array of short phrases and words.

    I tried a few tricks, and while I can come up with shorter code, it's still a coin flip over which one will be faster at 50,000 iterations with a $kw_list size of 750 array elements.

    I'd be surprised if you can get anything to consistently execute faster than what you've got there.
     
    joebert, Dec 5, 2008 IP
  8. wmtips

    wmtips Well-Known Member

    Messages:
    601
    Likes Received:
    70
    Best Answers:
    1
    Trophy Points:
    150
    #8
    I've tested ereg_replace and preg_replace performance, ereg seems to be faster than preg.
     
    wmtips, Dec 6, 2008 IP
  9. xrvel

    xrvel Notable Member

    Messages:
    918
    Likes Received:
    30
    Best Answers:
    2
    Trophy Points:
    225
    #9
    According to PHP.net, in most cases "preg" is faster than "ereg", but i have not tested it myself.
     
    xrvel, Dec 6, 2008 IP
  10. exodus

    exodus Well-Known Member

    Messages:
    1,900
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    165
    #10
    Rep gave, Thank you for all the reply's. I am going to be using a mix of xrvel and the one I posted. For the most part it is ran only once per entry and when I am editing the media entry. So, as it stands it suits my needs. Was just wondering if something could be produced that was simpler and neater to read. Thank you again for your replys.
     
    exodus, Dec 6, 2008 IP
  11. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #11
    can you post your final code here. i need something like this as well.

    thanks
     
    baris22, Dec 6, 2008 IP