PHP and non-latin languages

Discussion in 'PHP' started by Izonedig, Dec 22, 2009.

  1. #1
    I was trying to use the php function: substr with an arabic and chinese string but it never gives the right caracters (it shows something like: "??? ??? ?????" )

    The script opens google translate tool, to translate a word, and give it to me automatically, but when using substr to get the exact translated word, I found it "????????..."

    Can you please help me ?
     
    Izonedig, Dec 22, 2009 IP
  2. xenon2010

    xenon2010 Peon

    Messages:
    237
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    you need to convert the text to utf-8 in order to use substr...
    try to set your document's charset to utf-8..
    put this code between head tags..
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
     
    xenon2010, Dec 23, 2009 IP
  3. Izonedig

    Izonedig Member

    Messages:
    150
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    28
    #3
    I already did that, with no chance.
    The problem is not in the browser. Even when I see the html source code of the output, I see ���
    I also tried using mb_substr instead of substr, but not working...
     
    Izonedig, Dec 23, 2009 IP
  4. xenon2010

    xenon2010 Peon

    Messages:
    237
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #4
    this happens coz the text you are trying to cut is not in utf-8 you need to convert the string to utf-8..
    to do so you need to use iconv() ...
    i.e. most arabic sites use windows-1256 charsets. so to convert arabic characters to utf-8 you need to use iconv()..
     
    xenon2010, Dec 23, 2009 IP
  5. Izonedig

    Izonedig Member

    Messages:
    150
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    28
    #5
    Thank you Xenon2010, The page that contains arabic characters is this:
    http://translate.google.com/translate_t?hl=&ie=utf-8&text=welcome&sl=en&tl=ar#
    (using file_get_contents() )
    But I still don't find a solution :(

    I thought also about using:
    mb_convert_encoding($html_page,'HTML-ENTITIES', 'utf-8');
    It works with any string and convert arabic chars to #01256 and things like that. BUT DO NOT work with that google page !!!
     
    Last edited: Dec 23, 2009
    Izonedig, Dec 23, 2009 IP
  6. xenon2010

    xenon2010 Peon

    Messages:
    237
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #6
    okay you need to use CURL instead. so here is your solution:
    I just made this function to you its easy to use.
    function get_content($url)  
    { 
    	$ch = curl_init();  
    	curl_setopt($ch, CURLOPT_URL, $url);  
    	curl_setopt($ch, CURLOPT_HEADER, 1); 
    	curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)');  
    	ob_start();  
    	curl_exec ($ch);  
    	curl_close ($ch);  
    	$string = ob_get_contents();  
    	ob_end_clean();  
    	return $string;      
    }
    
    echo get_content('http://translate.google.com/translate_t?hl=&ie=utf-8&text=welcome&sl=en&tl=ar#'); 
    PHP:
    now it should work fine :D
    Rep me up :D
     
    xenon2010, Dec 23, 2009 IP
  7. Izonedig

    Izonedig Member

    Messages:
    150
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    28
    #7
    Working !!! :)
    Thanks a lot friend. Rep up :)
     
    Izonedig, Dec 23, 2009 IP
  8. xenon2010

    xenon2010 Peon

    Messages:
    237
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #8
    no problemo :D
     
    xenon2010, Dec 23, 2009 IP
  9. szalinski

    szalinski Peon

    Messages:
    341
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Any special reason you decided to use output buffering here? or something else i'm missing?
     
    szalinski, Jan 2, 2010 IP
  10. xenon2010

    xenon2010 Peon

    Messages:
    237
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #10
    nah no reason its similar to CURLOPT_RETURNTRANSFER
     
    xenon2010, Jan 3, 2010 IP