 Apple  Bat  Cat  Rat  Mat  Fat I want to get rid of these illegal characters. Can you suggest some solution. I am using php as programming language. These appears when i paste something from MS word to textarea of html page. I tried the below methods which didnt worked out 1) $contents = preg_replace('/[^\r\n\t\x20-\x7E\xA0-\xFF]/', ' ', $contents); 2) $string = preg_replace('/[^(\x20-\x7F)]*/','', $string); 3) (WORST OPTION)$retrievedAreaText = $_POST["textAreaId"]; $illegalChars = array("",); //others $retrievedAreaText = str_replace($illegalChars,"",$retrievedAreaText); Please Help.
MS Word is a pain - literally which is why people use text editors such as tinymce and ckeditor, The and effort involved in writing something to effectively parse a word document just isn't worth it.
in your html, if you have not yet done so, try changing your charset to utf-8 and see if that solves the problem. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />