1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Fixing my Shorten Function...

Discussion in 'PHP' started by ApacheTCP, Oct 14, 2008.

  1. #1
    Hey guys,

    I use a function to shorten the length of a string to less than 300 characters for output to screen, but do it in a way as to not break words. It works wonderfully, however, it does NOT work if there is a carriage return in the first 300 characters. I have a feeling my regular expression is at fault for not allowing carriage returns, and was hoping someone here with some regex knowledge could guide me to the proper modification I need to make:

    My Shorten Function:

    # Shortens string to n number of characters, yet does not break words:
    function shorten($str, $n, $delim='...')
    {
       $len = strlen($str);
       if ($len > $n) {
           preg_match('/(.{' . $n . '}.*?)\b/', $str, $matches);
    	   return rtrim($matches[1]) . $delim;
       }
       else {
           return $str;
       }
    }
    Code (markup):
    How I call the function:

    echo shorten($longString, 300);
    Code (markup):
    Thank you for any help. Much appreciation.
     
    ApacheTCP, Oct 14, 2008 IP
  2. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,998
    Best Answers:
    253
    Trophy Points:
    515
    #2
    Dot means any NON newline character, so it should auto-trim that off. One of the biggest gaps in the regex implementation is there is no way to say 'all characters' as a single expression. You also have the problem that by trying to take a number at a 'boundary', you are running the risk of chopping off the BEGINNING.

    For example, if we passed it 'this is a test' and told it to chop to 12 characters, because we are telling it to pull that 12 character run against the word boundary it should return 'is is a test' and not 'this is a', even with the run before it. You also have the problem that if you get multiple matches (entirely possible) using preg_match could return no value - which is what I think is happening when newline is passed.

    You also have the problem that I'm assuming you want 300 characters or less - your function will return 300 characters PLUS the next word break (in theory, in practice, well...)

    I would approach this from a different angle. I would use substr to chop it off then use preg_replace to chop off the nearest word border. This avoids trying to use regex to do the actual chopping, avoids the headache that trying to access an array retval is, and would return UNDER the value you pass for maximum length.

    function trimToWordBoundary ($inString,$trimLength,$suffix='...') {
    	if (strlen($inString)>$trimLength) {
    			return rtrim(preg_replace('/\s(\S+)$/','',substr($inString,0,$trimLength))).$suffix;
    	} else return $inString;
    }
    Code (markup):
    You'll notice my trimming replace uses whitespace (/s) and non-whitespace (/S) instead of word boundary - I've had nothing but headaches trying to use /b next to a 'run' of non-whitespace characters. This basically spells out exactly what I want. We still need the rTrim because if the last character after chopping is a space, our regexp won't trip.

    Should also run faster on longer strings since we are passing a smaller string to the regexp.

    Another possiblity would be to replace all whitespace characters with spaces before processing - though I'm not certain if you care about preserving newlines in your output.
     
    deathshadow, Oct 14, 2008 IP