1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Curl Character problems. Please help me

Discussion in 'PHP' started by baris22, May 7, 2010.

  1. #1
    Hello all,

    I am trying to grab some content from a web site. Web sites charset is charset=iso-8859-1. this is a sample of a page from that website.

    [​IMG]

    This is my codes

    
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    
    PHP:
    
    	$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; 
    	$header[] = "Accept-Encoding: *";
        $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    
    	$header[] = "Connection: Keep-Alive"; 
    	curl_setopt($c, CURLOPT_RETURNTRANSFER, TRUE);
    	curl_setopt($c, CURLOPT_HEADER, 0);
    	curl_setopt($c, CURLOPT_URL, $url);
    	curl_setopt($c, CURLOPT_TIMEOUT, 30);
    	curl_setopt($c, CURLOPT_COOKIEJAR, 'cookie.txt');
    	curl_setopt($c, CURLOPT_COOKIEFILE, 'cookie.txt');
    	curl_setopt ($c, CURLOPT_HTTPHEADER, $header);
    	curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.1)");
    	curl_setopt($c, CURL_GET, 1); 
    	$w=	curl_exec($c); 
    	curl_close($c);
    	preg_match("/<td class=\"details\">(.*)<div class=\"div\">(.*)<\/div>/isUS",$w,$matches);
        $post['fullpage'] = $matches[2];
    
       $fullpage= mysql_real_escape_string($post['fullpage']);
       $query="INSERT INTO `file` VALUES ('', '".$title."', '".$fullpage."', '".$no."', '')";
    
    
    
    PHP:
    My database setting is

    ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

    Out put on the site is like this.

    [​IMG]

    I get problem with some characters. I tried everything. I spend last 5 hours to sort this out. Please please help me.

    if i echo $fullpage before inserting into database, it looks ok. There is no problem.
     
    Last edited: May 7, 2010
    baris22, May 7, 2010 IP
  2. lorkan

    lorkan Peon

    Messages:
    20
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi!

    Seems to be that your sql db is set up in utf-8 and you are trying to put in information in iso-8859-1. You have two options as I see it:
    Option 1: Change the charset and collate on your sql db to iso-8859-1 and it should be better...
    OR
    Option 2:
    Do this:
    $fullpage= utf8_encode(mysql_real_escape_string($post['fullpage']));
    PHP:
    You might need to do a utf8_decode($output_from_db); when you output from your db, if it looks strange...
     
    lorkan, May 7, 2010 IP
  3. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #3
    I have already tried changing charsets on the site and on the database. I tried every single collation but it is still same.
     
    baris22, May 7, 2010 IP
  4. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #4
    I found the fault. It is the header of html. Can somebody tell me difference of them. I never paid attention to this before.

    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    
    PHP:

    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
    PHP:
     
    baris22, May 8, 2010 IP
  5. lorkan

    lorkan Peon

    Messages:
    20
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    In the other one, you are having you xhtml validated through the definer - the xhtml1-transitional.dtd file. try opening it through your browser and you will see what it looks like!
     
    lorkan, May 8, 2010 IP
  6. gapz101

    gapz101 Well-Known Member

    Messages:
    524
    Likes Received:
    8
    Best Answers:
    2
    Trophy Points:
    150
    #6
    SET NAMES 'utf8'
     
    gapz101, May 11, 2010 IP