Hello all, I am trying to grab some content from a web site. Web sites charset is charset=iso-8859-1. this is a sample of a page from that website. This is my codes <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> PHP: $header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; $header[] = "Accept-Encoding: *"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Connection: Keep-Alive"; curl_setopt($c, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($c, CURLOPT_HEADER, 0); curl_setopt($c, CURLOPT_URL, $url); curl_setopt($c, CURLOPT_TIMEOUT, 30); curl_setopt($c, CURLOPT_COOKIEJAR, 'cookie.txt'); curl_setopt($c, CURLOPT_COOKIEFILE, 'cookie.txt'); curl_setopt ($c, CURLOPT_HTTPHEADER, $header); curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.1)"); curl_setopt($c, CURL_GET, 1); $w= curl_exec($c); curl_close($c); preg_match("/<td class=\"details\">(.*)<div class=\"div\">(.*)<\/div>/isUS",$w,$matches); $post['fullpage'] = $matches[2]; $fullpage= mysql_real_escape_string($post['fullpage']); $query="INSERT INTO `file` VALUES ('', '".$title."', '".$fullpage."', '".$no."', '')"; PHP: My database setting is ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci Out put on the site is like this. I get problem with some characters. I tried everything. I spend last 5 hours to sort this out. Please please help me. if i echo $fullpage before inserting into database, it looks ok. There is no problem.
Hi! Seems to be that your sql db is set up in utf-8 and you are trying to put in information in iso-8859-1. You have two options as I see it: Option 1: Change the charset and collate on your sql db to iso-8859-1 and it should be better... OR Option 2: Do this: $fullpage= utf8_encode(mysql_real_escape_string($post['fullpage'])); PHP: You might need to do a utf8_decode($output_from_db); when you output from your db, if it looks strange...
I have already tried changing charsets on the site and on the database. I tried every single collation but it is still same.
I found the fault. It is the header of html. Can somebody tell me difference of them. I never paid attention to this before. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> PHP: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> PHP:
In the other one, you are having you xhtml validated through the definer - the xhtml1-transitional.dtd file. try opening it through your browser and you will see what it looks like!