Hi guys, what is the best way to detect text encoding and fix it , if there are non-utf8 characters? the ways I've tried: 1) mb_detect_encoding - doesn't work properly 2) iconv("UTF-8","UTF-8//IGNORE",$str) - doesn't work properly 3) preg_replace with different options... the last code I used is the following: function utf8replacer($captures) { if (!empty($captures[1])) { // Valid byte sequence. Return unmodified. return $captures[1]; } elseif (!empty($captures[2])) { // Invalid byte of the form 10xxxxxx. // Encode as 11000010 10xxxxxx. return "\xC2".$captures[2]; } else { // Invalid byte of the form 11xxxxxx. // Encode as 11000011 10xxxxxx. return "\xC3".$captures[3]; } } $regex = <<<'END' / ( [\x00-\x7F] # single-byte sequences 0xxxxxxx | [\xC0-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx | [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences 1110xxxx 10xxxxxx * 2 | [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3 ) | ( [\x80-\xBF] ) # invalid byte in range 10000000 - 10111111 | ( [\xC0-\xFF] ) # invalid byte in range 11000000 - 11111111 /x END; preg_replace_callback($regex, "utf8replacer", $txt); Code (markup): for example, there is not correctly saved text in db: digital point® Code (markup): on final page, it is displayed as digital point� Code (markup): the http://validator.w3.org validator shows the following : Sorry! This document can not be checked. Sorry, I am unable to validate this document because on line 55 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: utf8 "\xAE" does not map to Unicode When I open the page at line 55 it doesn't contain any characters at all . But I suppose the reason is in ® Code (markup): character, because if I remove it from db,the validator shows no warnings also, if I save my php page as static html and run through the same validator, it is always able to check it, even with ® Code (markup): ps. my php.ini mbstring settings, if needed: php 5.3+ (tried with 5.3.5, 5.3.2) Multibyte Support enabled Multibyte string engine libmbfl HTTP input encoding translation disabled mbstring.language neutral mbstring.strict_detection Off mbstring.substitute_character no value Code (markup): my php page contains the following header; <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> Code (markup):