Need to replace Unicode

vOlLvEriNe Member

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#1

Hello Guys, I Need Add Class Over Unicode Text, Like I Have Text

Text ا ب ج د Text
Click to expand...

And Need Output Like This

Text ا ب ج د Text
Click to expand...

I'm Using This
echo preg_replace('/[\x80-\xff]+/', '$0', $str);
PHP:
But it show

Text ا ب ج د Text
Click to expand...

Please Fix my code

Solved! View solution.

vOlLvEriNe, Aug 15, 2015 IP

ThePHPMaster Well-Known Member

Messages:: 737

Likes Received:: 52

Best Answers:: 33

Trophy Points:: 150

#2

It would be great if you can follow up on your topics once you get answers. For example this:

https://forums.digitalpoint.com/threads/fetchcol-prob.2761217/

It will attract more people to help you out if once the issue is solved you pick an accepted answer and followup on the topic.

To answer your question, you will need to include the space character into your expression:
/[\x80-\xff|\s]+/
Code (markup):

ThePHPMaster, Aug 15, 2015 IP

vOlLvEriNe Member

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#3

In my last post I didn't get the right answer

vOlLvEriNe, Aug 15, 2015 IP

vOlLvEriNe Member

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#4

/[\x80-\xff|\s]+/
Code (markup):
This show me something like this

Text <span class="unicode">ا ب ج د Text</span
Click to expand...

vOlLvEriNe, Aug 15, 2015 IP

qwikad.com Illustrious Member Affiliate Manager

Messages:: 7,391

Likes Received:: 1,730

Best Answers:: 31

Trophy Points:: 475

#5

Do this instead:
/[\x80-\xff\s]+/
Code (markup):

qwikad.com, Aug 15, 2015 IP

vOlLvEriNe Member

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#6

qwikad.com said: ↑
Do this instead:
/[\x80-\xff\s]+/
Code (markup):
Click to expand...
Well, this helps

vOlLvEriNe, Aug 15, 2015 IP

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,999

Best Answers:: 253

Trophy Points:: 515

#7

uhm... since UTF-8 can be anywhere from one to four bytes per character, how would 0x80 to 0xFF actually detect it and work? Doesn't that need to be 0x80..0xFFFFFFFF or something? I'm not sure a regex can actually detect UTF-8 or UTF-16 characters on a per character or run of characters basis... particularly since bit 6 is OFF on the extended bytes. Remember it's:

0b0xxx:xxx for ascii7
0b1xxx:xxx 0b10xx:xxx for two byte codepages
0b1xxx:xxx 0b10xx:xxx 0b10xx:xxx for three byte

.. and so forth. Honestly I'm a little surprised it's even able to pull the single characters for matches... though.. shouldn't /u be used to match by codepage instead? If it's not in codepage 0..7, then it's a non-ascii character, right?

Not that I'm following why you'd "need' to do that on a page in the first place, unless you're using some goofy webfont on your text that doesn't support those characters. (which yet ANOTHER reason why I'd never use webfonts on flow text). What's the usage scenario?

--- EDIT ---

Uhm, you want * not +. Duh, painfully obvious once I took a good look.
/[^\x00-\x7F]*/
Code (markup):

Last edited: Aug 15, 2015

deathshadow, Aug 15, 2015 IP

qwikad.com Illustrious Member Affiliate Manager

Messages:: 7,391

Likes Received:: 1,730

Best Answers:: 31

Trophy Points:: 475

#8

Doesn't x80-xFF also cover Arabic, Syriac alphabets?

I wondered myself why he would need to wrap it in a span. I figured he wants to make that font's size larger. When I tweet in Arabic the font always looks smaller. So, on a page, it will look smaller (thinner) compared to the English font. To make it look comparable it should probably be 1.8em when its English counterpart will be just 1.2em.

qwikad.com, Aug 16, 2015 IP

vOlLvEriNe Member

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#9

@deathshadow, I pick this regex from stackoverflow, And Don't Know Much about regex, And It Works, But I'm facing prob again, I have content like this

<article>hello, its urdu
 ا ب ج د
 etc etc</article>
Click to expand...

and this regex works on this

Text ا ب ج د Text
Click to expand...

Please fix it
@qwikad.com @deathshadow

vOlLvEriNe, Aug 16, 2015 IP

lasersgopew Member

Messages:: 15

Likes Received:: 4

Best Answers:: 1

Trophy Points:: 48

#10

I would quit fiddling with regex and just parse the string myself.


function mb_tagger($string, $open, $close){

    $char    = preg_split('/(?<!^)(?!$)/u', $string);
    $buffer  = '';
    $capture = false;

    foreach ($char as $key => $value) {
        $next = (isset($char[$key+1])) ? ord($char[$key+1]) : null;

        if(ord($value) > 127
        && $capture === false)
        {
            $buffer .= $open;
            $capture = true;
        }

        $buffer .= $value;

        if($next <= 127
        && $capture === true)
        {
            $buffer .= $close;
            $capture = false;
        }

    }

    return $buffer;
}

PHP:


$string = 'your text here';
echo mb_tagger($string, '<strong>', '</strong>');

PHP:

Result is any multi-byte character or sequence of characters in $string being encapsulated in those tags.

Last edited: Aug 16, 2015

lasersgopew, Aug 16, 2015 IP

vOlLvEriNe Member

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#11

It returns result like this

<article>hello, its urdu
 ا
ب
ج
 etc etc</article>
Click to expand...

I need

<article>hello, its urdu
 ا ب ج
 etc etc</article>
Click to expand...

vOlLvEriNe, Aug 16, 2015 IP

lasersgopew Member

Messages:: 15

Likes Received:: 4

Best Answers:: 1

Trophy Points:: 48

#12

That's because the multibyte characters are separated by ASCII spaces.


function mb_tagger($string, $open, $close, $includeWhitespace = false){

    $char    = preg_split('/(?<!^)(?!$)/u', $string);
    $buffer  = '';
    $capture = false;

    foreach ($char as $key => $value) {
        $next = (isset($char[$key+1])) ? ord($char[$key+1]) : null;

        if(ord($value) > 127
        && $capture === false)
        {
            $buffer .= $open;
            $capture = true;
        }

        $buffer .= $value;

        if($includeWhitespace
        && $capture === true
        && $next !== null
        && $next <= 32)
        {
            continue;
        }

        if($next <= 127
        && $capture === true)
        {

            $buffer .= $close;
            $capture = false;
        }

    }

    return $buffer;
}

PHP:

Now, when $includeWhitespace is not false, the first 33 ASCII characters are allowed within the tags. This will allow it to parse across line breaks, null characters, spaces, etc.

Last edited: Aug 16, 2015

lasersgopew, Aug 16, 2015 IP

qwikad.com likes this.

vOlLvEriNe Member

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#13

Thanks @lasersgopew, It works like charm, can we add dot and comma like space ?

vOlLvEriNe, Aug 16, 2015 IP

lasersgopew Member Best Answer

Messages:: 15

Likes Received:: 4

Best Answers:: 1

Trophy Points:: 48

#14


function mb_tagger($string, $open, $close, $includeWhitespace = false, $include = []){

    $char    = preg_split('/(?<!^)(?!$)/u', $string);
    $buffer  = '';
    $capture = false;
    $include = (!empty($include)) ? array_flip($include) : [];

    foreach ($char as $key => $value) {
        $peek = (isset($char[$key+1])) ? $char[$key+1] : null;
        $next = ($peek !== null)       ? ord($peek)    : null;

        if(ord($value) > 127
        && $capture === false)
        {
            $buffer .= $open;
            $capture = true;
        }

        $buffer .= $value;

        if($peek !== null
        && isset($include[$peek])
        && $capture === true)
        {
            continue;
        }

        if($includeWhitespace
        && $capture === true
        && $next !== null
        && $next <= 32)
        {
            continue;
        }

        if($next <= 127
        && $capture === true)
        {
            $buffer .= $close;
            $capture = false;
        }

    }

    return $buffer;
}

PHP:

Now, you can include an array of characters allowed within a sequence by doing.


$string = 'blah ب W ج د';
$allow  = ['W'];
echo mb_tagger($string, '<strong>', '</strong>', true, $allow);

PHP:

Result:


blah <strong>ب W ج د</strong>

Code (markup):

lasersgopew, Aug 16, 2015 IP

Log in or Sign up

Need to replace Unicode

vOlLvEriNe Member

ThePHPMaster Well-Known Member

vOlLvEriNe Member

vOlLvEriNe Member

qwikad.com Illustrious Member Affiliate Manager

vOlLvEriNe Member

deathshadow Acclaimed Member

qwikad.com Illustrious Member Affiliate Manager

vOlLvEriNe Member

lasersgopew Member

vOlLvEriNe Member

lasersgopew Member

vOlLvEriNe Member

lasersgopew Member Best Answer

Useful Searches