Curl Character problems. Please help me

Discussion in 'PHP' started by baris22, May 7, 2010.

  1. #1
    Hello all,

    I am trying to grab some content from a web site. Web sites charset is charset=iso-8859-1. this is a sample of a page from that website.

    [​IMG]

    This is my codes

    
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    
    PHP:
    
    	$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; 
    	$header[] = "Accept-Encoding: *";
        $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    
    	$header[] = "Connection: Keep-Alive"; 
    	curl_setopt($c, CURLOPT_RETURNTRANSFER, TRUE);
    	curl_setopt($c, CURLOPT_HEADER, 0);
    	curl_setopt($c, CURLOPT_URL, $url);
    	curl_setopt($c, CURLOPT_TIMEOUT, 30);
    	curl_setopt($c, CURLOPT_COOKIEJAR, 'cookie.txt');
    	curl_setopt($c, CURLOPT_COOKIEFILE, 'cookie.txt');
    	curl_setopt ($c, CURLOPT_HTTPHEADER, $header);
    	curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.1)");
    	curl_setopt($c, CURL_GET, 1); 
    	$w=	curl_exec($c); 
    	curl_close($c);
    	preg_match("/<td class=\"details\">(.*)<div class=\"div\">(.*)<\/div>/isUS",$w,$matches);
        $post['fullpage'] = $matches[2];
    
       $fullpage= mysql_real_escape_string($post['fullpage']);
       $query="INSERT INTO `file` VALUES ('', '".$title."', '".$fullpage."', '".$no."', '')";
    
    
    
    PHP:
    My database setting is

    ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

    Out put on the site is like this.

    [​IMG]

    I get problem with some characters. I tried everything. I spend last 5 hours to sort this out. Please please help me.

    if i echo $fullpage before inserting into database, it looks ok. There is no problem.
     
    Last edited: May 7, 2010
    baris22, May 7, 2010 IP
  2. lorkan

    lorkan Peon

    Messages:
    20
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi!

    Seems to be that your sql db is set up in utf-8 and you are trying to put in information in iso-8859-1. You have two options as I see it:
    Option 1: Change the charset and collate on your sql db to iso-8859-1 and it should be better...
    OR
    Option 2:
    Do this:
    $fullpage= utf8_encode(mysql_real_escape_string($post['fullpage']));
    PHP:
    You might need to do a utf8_decode($output_from_db); when you output from your db, if it looks strange...
     
    lorkan, May 7, 2010 IP
  3. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #3
    I have already tried changing charsets on the site and on the database. I tried every single collation but it is still same.
     
    baris22, May 7, 2010 IP
  4. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #4
    I found the fault. It is the header of html. Can somebody tell me difference of them. I never paid attention to this before.

    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    
    PHP:

    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
    PHP:
     
    baris22, May 8, 2010 IP
  5. lorkan

    lorkan Peon

    Messages:
    20
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    In the other one, you are having you xhtml validated through the definer - the xhtml1-transitional.dtd file. try opening it through your browser and you will see what it looks like!
     
    lorkan, May 8, 2010 IP
  6. gapz101

    gapz101 Well-Known Member

    Messages:
    524
    Likes Received:
    8
    Best Answers:
    2
    Trophy Points:
    150
    #6
    SET NAMES 'utf8'
     
    gapz101, May 11, 2010 IP