Wrong encoding when using XPath

Discussion in 'PHP' started by QueenZ, Feb 23, 2012.

  1. #1
    Hello, I am trying to parse the title from a Chinese website but I'm getting a wrong result.... It seems like an encoding problem? What can I do about it?

    I need to get the title, the text on the gray background: 我和哥哥的秘密花园

    But instead it's outputting this: 脦脪潞脥赂莽赂莽碌脛脙脴脙脺禄篓脭掳


    what's wrong?

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html>
    	<head>
    		<title>TEST</title>
    		<meta charset="gbk" />
    	</head>
    	
    	<body>
    		<?php
    			$dom = new DomDocument;
    			libxml_use_internal_errors(true);
    			$am_link = "http://tieba.baidu.com/p/21993922";
    			$dom->loadHTMLFile($am_link); 
    			libxml_clear_errors();
    
    
    			$xpath = new DomXpath($dom);
    			$nodes = $xpath->query('//div[@class="l_thread_title"]/descendant::h1[1]');
    			foreach ($nodes as $node)
    			{
    			  echo $node->nodeValue, "\n";
    			  echo "<br />";
    			}
    		?>
    	</body>
    </html>
    
    
    
    Code (markup):
     
    QueenZ, Feb 23, 2012 IP
  2. QueenZ

    QueenZ Peon

    Messages:
    175
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    any ideas how to make this? It takes it right with other languages but not Chinese....
     
    QueenZ, Mar 3, 2012 IP