Log in or Sign up

Wrong encoding when using XPath

Discussion in 'PHP' started by QueenZ, Feb 23, 2012.

QueenZ Peon

Messages:: 175

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

Hello, I am trying to parse the title from a Chinese website but I'm getting a wrong result.... It seems like an encoding problem? What can I do about it?

I need to get the title, the text on the gray background: æˆ‘å’Œå“¥å“¥çš„ç§˜å¯†èŠ±å›

But instead it's outputting this: è„¦è„ªæ½žè„¥èµ‚èŽ½èµ‚èŽ½ç¢Œè„›è„™è„´è„™è„ºç¦„ç¯“è„æŽ³

what's wrong?
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html>
	<head>
		<title>TEST</title>
		<meta charset="gbk" />
	</head>
	
	<body>
		<?php
			$dom = new DomDocument;
			libxml_use_internal_errors(true);
			$am_link = "http://tieba.baidu.com/p/21993922";
			$dom->loadHTMLFile($am_link); 
			libxml_clear_errors();


			$xpath = new DomXpath($dom);
			$nodes = $xpath->query('//div[@class="l_thread_title"]/descendant::h1[1]');
			foreach ($nodes as $node)
			{
			  echo $node->nodeValue, "\n";
			  echo "<br />";
			}
		?>
	</body>
</html>
Code (markup):

QueenZ, Feb 23, 2012 IP

QueenZ Peon

Messages:

175

Likes Received:

1

Best Answers:

0

Trophy Points:

0

#2

any ideas how to make this? It takes it right with other languages but not Chinese....

QueenZ, Mar 3, 2012 IP

(You must log in or sign up to reply here.)

Log in or Sign up

Wrong encoding when using XPath

QueenZ Peon

QueenZ Peon

Useful Searches