I'm creating a function for my site so that it generates search results and cuts it off each result after 200 characters. The problem is some of the results have HTML in them and they get cut off before a tag is finished messing up the site. I created a function to end the tags which works fine but sometimes results gets cut off like "blah blah blah</str" and then the end tag gets added on so it is like "blah blah blah </str</strong>" so it doesn't actually get ended. Anyone have a suggested fix for this? Here is the code I'm using: function closeTags($html) { preg_match_all("/<\/?(\w+)((\s+(\w|\w[\w-]*\w)(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/i",$html,$result); $tags = &$result[0]; $closeCnt = 0; for ($i=count($tags)-1;$i>=0;$i--) { if ($tags[$i]{strlen($tags[$i])-2}!='/') { if ($tags[$i]{1}!='/') { if (!$closeCnt) $html .= '</'.$result[1][$i].'>'; else $closeCnt--; } elseif ($i>0&&$result[1][$i]==$result[1][$i-1]) $closeCnt++; } } return $html; } Code (markup):
I would rather suggest stripping out all existing scripting and html - unless it's adamant that the results have the code intact - mostly that would perhaps be for embedded URL's, but that will not work either if the result is being cut. If the codes need to be left in, maybe you can adapt the cutting (does it absolutely have to be 200characters?) - adjust the cutting either before the ending < or expand the characters so that you cut the result after the ending >
It doesn't have to be exactly 200 characters. Honestly the HTML isn't absolutely necessary. How should I strip out all the html completely? Is there a simple way of doing it? I haven't been coding for a couple years so I've lost a lot of my knowledge. Thanks!
You can have a look here: http://php.net/strip_tags (both the built-in PHP-function, and also some of the suggestions in the comments)
If the HTML isn't required, simply use: $value=trim(htmlentities(strip_tags($value))); $value= substr( $value, 0,200 ); Code (markup): You don't need the preg match and all. Alternatively, if you need the HTML, then you can explode the string at spaces, check the last value in array to see if there's any '</' in it. If so, remove the element from the array and implode it again. function closeTags($s){ $s= 'this is <b>some</b> text I want to <i>remove</i> html <u>from</u>'; $s= substr($s,0,65); //change 65 to 200 $s= explode(' ',$s); $l= sizeof($s)-1; $pos= strpos($s[$l], '</'); if($pos === false){ $s=implode(' ',$s); }else{ unset($s[$l]); $s=implode(' ',$s); } return $s; } Code (markup): The defect in this is that you might still have an opening tag left in the HTML like <u> and without a closing </u> everything on the website will get underlined. Same with a left out <b>, <i>, <em>, <strong>, <h1>, <a and <img tags etc. So the best choice is strip_tags and then do the substr. Take care