I have the below function. It basically takes text and removes all the most common words and non wanted words and then makes what is left over a clickable keyword for that piece of media. Looking for a faster way to do this per media entry or cleaner and better way to do what I am looking to do. Remove the words I don't want and create a list of clickable keywords. function clean($str) { $str = strip_tags($str); $str = ereg_replace("[^a-zA-Z ]", " ", $str); $str = eregi_replace(" +", " ", $str); $str = strtolower($str); return $str; } function getkeywords($kw_list, $count) { global $SITEURL; if (trim($kw_list[0]) == '') { return false; } $kw_list = array_map("clean", $kw_list); //remove any single not needed words. $string = "me,very,any,cc,their,thier,plz,ld,ok,okay,wouldn,since,soon,the,of,and,a,to,in,is,you,that,it,he,was,for,on,are,as,with,his,they,i,at,be,this,have,from,or,one,had,by,word,but,not,what,all,were,we,when,your,can,said,there,use,an,each,which,she,do,how,their,if,will,up,other,about,out,many,then,them,these,so,some,her,would,make,like,him,into,time,has,look,two,more,write,go,see,number,no,way,could,people,my,than,first,water,been,call,who,oil,its,now,find,long,down,day,did,get,come,made,may,part,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,0,1,2,3,4,5,6,7,8,9"; $remove = explode(',', $string); $f = array(); foreach ($kw_list as $word) { if (!in_array("$word", $remove) && !in_array("$word", $f)) { $f[] = $word; } } if (trim($f[0]) == '') { return false; } shuffle($f); if (count($f) <= ($count - 1)) { $count = count($f) - 1; } for ($i = 0; $i < ($count - 1); $i++) { $kw_return .= '<a href="' . $SITEURL . '/index.php?search_query=' . trim($f[$i]) . '" title="Find media with ' . $f[$i] . ' in them.">' . trim(ucfirst($f[$i])) . '</a>, '; } $kw_return .= '<a href="' . $SITEURL . '/index.php?search_query=' . trim($f[$count]) . '" title="Find media with ' . $f[$count] . ' in them.">' . trim(ucfirst($f[$count])) . '</a>'; return $kw_return; } PHP:
Here is a new one. i replaced ereg with preg and some modifications. Not sure if this is faster. <?php $junk = 'money sweet sugar car , script, me'; echo getkeywords(explode(' ', $junk), 100); function clean($str) { $str = strip_tags($str); $str = preg_replace('/([^a-z ]+)/i', ' ', $str); $str = preg_replace('/([ ]+)/', ' ', $str); $str = strtolower($str); return $str; } function getkeywords($kw_list, $count) { global $SITEURL; if (trim($kw_list[0]) == '') { return false; } $kw_list = array_map('clean', $kw_list); // mod, just use the same variable $remove = 'me,very,any,cc,their,thier,plz,ld,ok,okay,wouldn,since,soon,the,of,and,a,to,in,is,you,that,it,he,was,for,on,are,as,with,his,they,i,at,be,this,have,from,or,one,had,by,word,but,not,what,all,were,we,when,your,can,said,there,use,an,each,which,she,do,how,their,if,will,up,other,about,out,many,then,them,these,so,some,her,would,make,like,him,into,time,has,look,two,more,write,go,see,number,no,way,could,people,my,than,first,water,been,call,who,oil,its,now,find,long,down,day,did,get,come,made,may,part,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,0,1,2,3,4,5,6,7,8,9'; $remove = explode(',', $remove); $f = array(); foreach ($kw_list as $word) { $word = trim($word); if (!in_array($word, $remove) && !in_array($word, $f) && $word != '') { $f[] = $word; } } unset($remove);// mod if ($f == array()) {// mod return false; } shuffle($f); while (isset($f[$count])) {// while have more than $count array_pop($f);// remove them } $kw_return = ''; foreach ($f as $f2) {// i modified for into foreach $kw_return .= '<a href="' . $SITEURL . '/index.php?search_query=' . $f2 . '" title="Find media with ' . $f2 . ' in them.">' . ucfirst($f2) . '</a>, '; } unset($f); return rtrim($kw_return, ', ');// mod } ?> PHP:
If you have a list of the keywords that you know are clickable, then it would save lots of time. More like the way inline ads work. Imagine you have an article of great length, then your algorithm will take too much time. Peace,
I don't have that list. I am going for every other word in a short under 200 word description would be a hot word (key word). --- Thank you xrvel.
I'll be honest, I didn't read through your code thoroughly, but it seems to me you're using arrays too much here, where hashtables would be more efficient. Wherever you're checking in_array, it could be checking in a hashtable instead, which would be faster.
Yeah, searching big arrays using in_array is more expensive than querying them by the hash. Also to improve performance (if function called several times), it may be reasonable to save ignore words array into the global var. So you can improve code speed this way: ... //ignore words cache if (!isset($GLOBALS['ignorewords'])) { $rem = 'me,very,any,cc,their,thier,plz,ld,ok,okay,wouldn,since,soon,the,of,and,a,to,in,is,you,that,it,he,was,for,on,are,as,with,his,they,i,at,be,this,have,from,or,one,had,by,word,but,not,what,all,were,we,when,your,can,said,there,use,an,each,which,she,do,how,their,if,will,up,other,about,out,many,then,them,these,so,some,her,would,make,like,him,into,time,has,look,two,more,write,go,see,number,no,way,could,people,my,than,first,water,been,call,who,oil,its,now,find,long,down,day,did,get,come,made,may,part,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,0,1,2,3,4,5,6,7,8,9'; $rem = explode(',', $rem); //convert values into keys $remove = array(); foreach ($rem as $v) $remove[$v] = 1; $GLOBALS['ignorewords'] = $remove; } else $remove = $GLOBALS['ignorewords']; $f = array(); foreach ($kw_list as $word) { $word = trim($word); if (!isset($remove[$word]) .... PHP:
I don't believe preg VS ereg will make a difference assuming the $kw_list argument is an array of short phrases and words. I tried a few tricks, and while I can come up with shorter code, it's still a coin flip over which one will be faster at 50,000 iterations with a $kw_list size of 750 array elements. I'd be surprised if you can get anything to consistently execute faster than what you've got there.
Rep gave, Thank you for all the reply's. I am going to be using a mix of xrvel and the one I posted. For the most part it is ran only once per entry and when I am editing the media entry. So, as it stands it suits my needs. Was just wondering if something could be produced that was simpler and neater to read. Thank you again for your replys.