Parsing XML: Errors with Special Characters

Discussion in 'PHP' started by tin2mon, Sep 26, 2010.

  1. #1
    Anytime I try to extract the title of an article that has special characters, such as & or ', it's only returning the letters after the special characters.

    single quotes are displaying as ' (in xml)
    double quotes are displaying as " (in xml)
    ampersands are displaying as & amp ; (in xml)

    These 3 special characters are causing the title to output incorrectly, starting after the last instance of any of those special characters in the title line.
    So if I have: This & is "the title" now, only now will output as the title.

    I had a suggestion to use htmlspecialchars, but maybe I'm using it incorrectly.

    Code:

    <?php
    for($x=0;$x<count($document_array);$x++){
    $newtitle = htmlspecialchars_decode($document_array[$x]->title, ENT_QUOTES);
    echo "\t" . $newtitle . "\n<br/>";
    echo "<b>\t" . $document_array[$x]->title . "</b>\n<br/>";
    $newdate = date('m/j/Y',strtotime($document_array[$x]->date));
    echo "\t" . $newdate . " | " . $document_array[$x]->source . "\n<br/>";
    echo "\t" . $document_array[$x]->ingress . "\n<br/><br/>";
    }
    ?>



    Output of lines 4 and 5 are identical. I've tried htmlspecialchars and htmlspecialchars_decode. Both produce the same results, which is exactly what the title is showing in the XML element:
    amp;T June 7th (after htmlspecialchars)
    amp;T June 7th (before htmlspecialchars)
    <title>Metered Data Plans, Tethering Coming To AT&T June 7th</title> (the xml element)
    Metered Data Plans, Tethering Coming To AT&T June 7th (what I'm seeking to output)
     
    tin2mon, Sep 26, 2010 IP