hi, I'm using curl library to grab the content of an arabic website.. but when i echo the data of that website..i get all characters in ????? format. after i checked the website encoding i found that the website is using windows-1256 encoding (arabic).. so when curl grabs the content it converts the data to unknown characters automatically.. so when I print the data i get all characters in ???????? format.. but if i change browser encoding i can get the correct format.. so my question is there any method to convert windows-1256 encoding into utf-8 format with php??
you just said that if you change your browser's encoding to utf8 it shows ok. so that means that the content you are receiving is in utf8, but the page you are viewing thinks its windows-1256. so what header() does is send a HTTP-HEADER line before you send any text, and your browser will know to view the page in utf-8
the website is using windows-1256, however when curl grabs the content..it treats the data as utf-8..originally its windows-1256 so what i want is to convert the utf-8 back to windows-1256 here is the source: <?php #I'm using curl library to grab the content of an arabic website.. but when i echo the data of that website..i get all characters in ????? format. #after i checked the website encoding i found that the website is using windows-1256 encoding (arabic).. #so when curl grabs the content it converts the data to unknown characters automatically.. so when I print the data i get all characters in ???????? format.. #but if i change browser encoding i can get the correct format.. $url2="http://forum.kooora.com/f.aspx?mode=f&f=169"; //now show me my post function get_content($url) { $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_HEADER, 1); $str = "Accept-Language: en-us,en;q=0.5\r\n"; $str .= "Accept-Charset: windows-1256;q=0.7,*;q=0.7\r\n"; $str .= "Keep-Alive: 300\r\n"; $str .= "Connection: keep-alive\r\n"; curl_setopt($ch, CURLOPT_HTTPHEADER, array($str)); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)'); curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt'); //saved cookies ob_start(); curl_exec ($ch); curl_close ($ch); $string = ob_get_contents(); ob_end_clean(); return $string; } $content = get_content("$url2"); $pattern='/"ftnh",(.*?),(.*?)(روابط)(.*?),/'; //this pattern will get all words near "روابط" <====== here "روابط " already in utf-8 format //but when i use preg_match_all function to match this word with the words on the website i get unmatched result, //when i browse manually i absoluty can read many words similar to this 1 if(preg_match_all($pattern,$content,$out,PREG_PATTERN_ORDER)) { echo "matched"; print_r($out); } else { echo "no match"; } ?> PHP: