1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Need to replace Unicode

Discussion in 'PHP' started by vOlLvEriNe, Aug 15, 2015.

  1. #1
    Hello Guys, I Need Add Class Over Unicode Text, Like I Have Text
    And Need Output Like This
    I'm Using This
    echo preg_replace('/[\x80-\xff]+/', '<span class="unicode">$0</span>', $str);
    PHP:
    But it show
    Please Fix my code
     
    Solved! View solution.
    vOlLvEriNe, Aug 15, 2015 IP
  2. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #2
    It would be great if you can follow up on your topics once you get answers. For example this:

    https://forums.digitalpoint.com/threads/fetchcol-prob.2761217/

    It will attract more people to help you out if once the issue is solved you pick an accepted answer and followup on the topic.

    To answer your question, you will need to include the space character into your expression:

    
    /[\x80-\xff|\s]+/
    
    Code (markup):
     
    ThePHPMaster, Aug 15, 2015 IP
  3. vOlLvEriNe

    vOlLvEriNe Member

    Messages:
    99
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #3
    In my last post I didn't get the right answer :(
     
    vOlLvEriNe, Aug 15, 2015 IP
  4. vOlLvEriNe

    vOlLvEriNe Member

    Messages:
    99
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #4
    
    /[\x80-\xff|\s]+/
    
    Code (markup):
    This show me something like this
     
    vOlLvEriNe, Aug 15, 2015 IP
  5. qwikad.com

    qwikad.com Illustrious Member Affiliate Manager

    Messages:
    7,151
    Likes Received:
    1,656
    Best Answers:
    29
    Trophy Points:
    475
    #5
    Do this instead:

    /[\x80-\xff\s]+/
    Code (markup):
     
    qwikad.com, Aug 15, 2015 IP
  6. vOlLvEriNe

    vOlLvEriNe Member

    Messages:
    99
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #6
    Well, this helps
     
    vOlLvEriNe, Aug 15, 2015 IP
  7. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,998
    Best Answers:
    253
    Trophy Points:
    515
    #7
    uhm... since UTF-8 can be anywhere from one to four bytes per character, how would 0x80 to 0xFF actually detect it and work? Doesn't that need to be 0x80..0xFFFFFFFF or something? I'm not sure a regex can actually detect UTF-8 or UTF-16 characters on a per character or run of characters basis... particularly since bit 6 is OFF on the extended bytes. Remember it's:

    0b0xxx:xxx for ascii7
    0b1xxx:xxx 0b10xx:xxx for two byte codepages
    0b1xxx:xxx 0b10xx:xxx 0b10xx:xxx for three byte

    .. and so forth. Honestly I'm a little surprised it's even able to pull the single characters for matches... though.. shouldn't /u be used to match by codepage instead? If it's not in codepage 0..7, then it's a non-ascii character, right?

    Not that I'm following why you'd "need' to do that on a page in the first place, unless you're using some goofy webfont on your text that doesn't support those characters. (which yet ANOTHER reason why I'd never use webfonts on flow text). What's the usage scenario?

    --- EDIT ---

    Uhm, you want * not +. Duh, painfully obvious once I took a good look.

    /[^\x00-\x7F]*/
    Code (markup):
     
    Last edited: Aug 15, 2015
    deathshadow, Aug 15, 2015 IP
  8. qwikad.com

    qwikad.com Illustrious Member Affiliate Manager

    Messages:
    7,151
    Likes Received:
    1,656
    Best Answers:
    29
    Trophy Points:
    475
    #8
    Doesn't x80-xFF also cover Arabic, Syriac alphabets?

    I wondered myself why he would need to wrap it in a span. I figured he wants to make that font's size larger. When I tweet in Arabic the font always looks smaller. So, on a page, it will look smaller (thinner) compared to the English font. To make it look comparable it should probably be 1.8em when its English counterpart will be just 1.2em.
     
    qwikad.com, Aug 16, 2015 IP
  9. vOlLvEriNe

    vOlLvEriNe Member

    Messages:
    99
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #9
    @deathshadow, I pick this regex from stackoverflow, And Don't Know Much about regex, And It Works, But I'm facing prob again, I have content like this
    and this regex works on this
    Please fix it ;)
    @qwikad.com @deathshadow
     
    vOlLvEriNe, Aug 16, 2015 IP
  10. lasersgopew

    lasersgopew Member

    Messages:
    15
    Likes Received:
    4
    Best Answers:
    1
    Trophy Points:
    48
    #10
    I would quit fiddling with regex and just parse the string myself.

    
    function mb_tagger($string, $open, $close){
    
        $char    = preg_split('/(?<!^)(?!$)/u', $string);
        $buffer  = '';
        $capture = false;
    
        foreach ($char as $key => $value) {
            $next = (isset($char[$key+1])) ? ord($char[$key+1]) : null;
    
            if(ord($value) > 127
            && $capture === false)
            {
                $buffer .= $open;
                $capture = true;
            }
    
            $buffer .= $value;
    
            if($next <= 127
            && $capture === true)
            {
                $buffer .= $close;
                $capture = false;
            }
    
        }
    
        return $buffer;
    }
    
    PHP:
    
    $string = 'your text here';
    echo mb_tagger($string, '<strong>', '</strong>');
    
    PHP:
    Result is any multi-byte character or sequence of characters in $string being encapsulated in those tags.
     
    Last edited: Aug 16, 2015
    lasersgopew, Aug 16, 2015 IP
  11. vOlLvEriNe

    vOlLvEriNe Member

    Messages:
    99
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #11
    It returns result like this
    I need
     
    vOlLvEriNe, Aug 16, 2015 IP
  12. lasersgopew

    lasersgopew Member

    Messages:
    15
    Likes Received:
    4
    Best Answers:
    1
    Trophy Points:
    48
    #12
    That's because the multibyte characters are separated by ASCII spaces.

    
    function mb_tagger($string, $open, $close, $includeWhitespace = false){
    
        $char    = preg_split('/(?<!^)(?!$)/u', $string);
        $buffer  = '';
        $capture = false;
    
        foreach ($char as $key => $value) {
            $next = (isset($char[$key+1])) ? ord($char[$key+1]) : null;
    
            if(ord($value) > 127
            && $capture === false)
            {
                $buffer .= $open;
                $capture = true;
            }
    
            $buffer .= $value;
    
            if($includeWhitespace
            && $capture === true
            && $next !== null
            && $next <= 32)
            {
                continue;
            }
    
            if($next <= 127
            && $capture === true)
            {
    
                $buffer .= $close;
                $capture = false;
            }
    
        }
    
        return $buffer;
    }
    
    PHP:
    Now, when $includeWhitespace is not false, the first 33 ASCII characters are allowed within the tags. This will allow it to parse across line breaks, null characters, spaces, etc.
     
    Last edited: Aug 16, 2015
    lasersgopew, Aug 16, 2015 IP
    qwikad.com likes this.
  13. vOlLvEriNe

    vOlLvEriNe Member

    Messages:
    99
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #13
    Thanks @lasersgopew, It works like charm, can we add dot and comma like space ?
     
    vOlLvEriNe, Aug 16, 2015 IP
  14. #14
    
    function mb_tagger($string, $open, $close, $includeWhitespace = false, $include = []){
    
        $char    = preg_split('/(?<!^)(?!$)/u', $string);
        $buffer  = '';
        $capture = false;
        $include = (!empty($include)) ? array_flip($include) : [];
    
        foreach ($char as $key => $value) {
            $peek = (isset($char[$key+1])) ? $char[$key+1] : null;
            $next = ($peek !== null)       ? ord($peek)    : null;
    
            if(ord($value) > 127
            && $capture === false)
            {
                $buffer .= $open;
                $capture = true;
            }
    
            $buffer .= $value;
    
            if($peek !== null
            && isset($include[$peek])
            && $capture === true)
            {
                continue;
            }
    
            if($includeWhitespace
            && $capture === true
            && $next !== null
            && $next <= 32)
            {
                continue;
            }
    
            if($next <= 127
            && $capture === true)
            {
                $buffer .= $close;
                $capture = false;
            }
    
        }
    
        return $buffer;
    }
    
    PHP:
    Now, you can include an array of characters allowed within a sequence by doing.
    
    $string = 'blah ب W ج د';
    $allow  = ['W'];
    echo mb_tagger($string, '<strong>', '</strong>', true, $allow);
    
    PHP:
    Result:
    
    blah <strong>ب W ج د</strong>
    
    Code (markup):
     
    lasersgopew, Aug 16, 2015 IP