Hello Guys, I Need Add Class Over Unicode Text, Like I Have Text And Need Output Like This I'm Using This echo preg_replace('/[\x80-\xff]+/', '<span class="unicode">$0</span>', $str); PHP: But it show Please Fix my code
It would be great if you can follow up on your topics once you get answers. For example this: https://forums.digitalpoint.com/threads/fetchcol-prob.2761217/ It will attract more people to help you out if once the issue is solved you pick an accepted answer and followup on the topic. To answer your question, you will need to include the space character into your expression: /[\x80-\xff|\s]+/ Code (markup):
uhm... since UTF-8 can be anywhere from one to four bytes per character, how would 0x80 to 0xFF actually detect it and work? Doesn't that need to be 0x80..0xFFFFFFFF or something? I'm not sure a regex can actually detect UTF-8 or UTF-16 characters on a per character or run of characters basis... particularly since bit 6 is OFF on the extended bytes. Remember it's: 0b0xxx:xxx for ascii7 0b1xxx:xxx 0b10xx:xxx for two byte codepages 0b1xxx:xxx 0b10xx:xxx 0b10xx:xxx for three byte .. and so forth. Honestly I'm a little surprised it's even able to pull the single characters for matches... though.. shouldn't /u be used to match by codepage instead? If it's not in codepage 0..7, then it's a non-ascii character, right? Not that I'm following why you'd "need' to do that on a page in the first place, unless you're using some goofy webfont on your text that doesn't support those characters. (which yet ANOTHER reason why I'd never use webfonts on flow text). What's the usage scenario? --- EDIT --- Uhm, you want * not +. Duh, painfully obvious once I took a good look. /[^\x00-\x7F]*/ Code (markup):
Doesn't x80-xFF also cover Arabic, Syriac alphabets? I wondered myself why he would need to wrap it in a span. I figured he wants to make that font's size larger. When I tweet in Arabic the font always looks smaller. So, on a page, it will look smaller (thinner) compared to the English font. To make it look comparable it should probably be 1.8em when its English counterpart will be just 1.2em.
@deathshadow, I pick this regex from stackoverflow, And Don't Know Much about regex, And It Works, But I'm facing prob again, I have content like this and this regex works on this Please fix it @qwikad.com @deathshadow
I would quit fiddling with regex and just parse the string myself. function mb_tagger($string, $open, $close){ $char = preg_split('/(?<!^)(?!$)/u', $string); $buffer = ''; $capture = false; foreach ($char as $key => $value) { $next = (isset($char[$key+1])) ? ord($char[$key+1]) : null; if(ord($value) > 127 && $capture === false) { $buffer .= $open; $capture = true; } $buffer .= $value; if($next <= 127 && $capture === true) { $buffer .= $close; $capture = false; } } return $buffer; } PHP: $string = 'your text here'; echo mb_tagger($string, '<strong>', '</strong>'); PHP: Result is any multi-byte character or sequence of characters in $string being encapsulated in those tags.
That's because the multibyte characters are separated by ASCII spaces. function mb_tagger($string, $open, $close, $includeWhitespace = false){ $char = preg_split('/(?<!^)(?!$)/u', $string); $buffer = ''; $capture = false; foreach ($char as $key => $value) { $next = (isset($char[$key+1])) ? ord($char[$key+1]) : null; if(ord($value) > 127 && $capture === false) { $buffer .= $open; $capture = true; } $buffer .= $value; if($includeWhitespace && $capture === true && $next !== null && $next <= 32) { continue; } if($next <= 127 && $capture === true) { $buffer .= $close; $capture = false; } } return $buffer; } PHP: Now, when $includeWhitespace is not false, the first 33 ASCII characters are allowed within the tags. This will allow it to parse across line breaks, null characters, spaces, etc.
function mb_tagger($string, $open, $close, $includeWhitespace = false, $include = []){ $char = preg_split('/(?<!^)(?!$)/u', $string); $buffer = ''; $capture = false; $include = (!empty($include)) ? array_flip($include) : []; foreach ($char as $key => $value) { $peek = (isset($char[$key+1])) ? $char[$key+1] : null; $next = ($peek !== null) ? ord($peek) : null; if(ord($value) > 127 && $capture === false) { $buffer .= $open; $capture = true; } $buffer .= $value; if($peek !== null && isset($include[$peek]) && $capture === true) { continue; } if($includeWhitespace && $capture === true && $next !== null && $next <= 32) { continue; } if($next <= 127 && $capture === true) { $buffer .= $close; $capture = false; } } return $buffer; } PHP: Now, you can include an array of characters allowed within a sequence by doing. $string = 'blah ب W ج د'; $allow = ['W']; echo mb_tagger($string, '<strong>', '</strong>', true, $allow); PHP: Result: blah <strong>ب W ج د</strong> Code (markup):