Hi everyone, I would appreciate if someone helps me regarding the following problem. I am trying to use the following code to extract links from a page $input = @file_get_contents($input_file) or die('Could not access file: $input_file'); $regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; if(preg_match_all("/$regexp/siU", $input, $matches)) { # $matches[2] = array of link addresses # $matches[3] = array of link text - including HTML code } PHP: everything works fine except the unicode character is not copied properly. For example the following code <a href="get.php?d=07/10/24/w/p_tkytx">cÖ_g cvZv</a> PHP: is identified as <a href="get.php?d=07/10/24/w/p_tkytx">c�_g cvZv</a> PHP: although I see a ? instead of � here. I am sure this is a unicode character code problem. This is a foreign language page I am working on. Can some one help me copy the exact code as show on the second code box.
I figured out the problem as it was the charset causing the problem. I had to use <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> PHP: instead of <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> PHP: ' thank you for your help.