Non Standard Characters Breaking a Script

Discussion in 'PHP' started by SEO-Expert, Oct 27, 2010.

  1. #1
    Editing a script that imports CSV files (affiliate datafeeds) into a WordPress database.

    It's running into problems when the CSV files contain characters like

    Â and É

    Basically the script truncates the database entry after these characters, so if I have an entry

    The script cuts Ât the  and nothing else is added

    All I get is

    The script cuts

    Added to the database.

    I don't understand the script enough to change it to accept these characters, so looking to remove/replace them.

    I can add individual str_replace code to replace these along the lines of:

    $content = str_replace('Â', 'A', $content);
    $content = str_replace('É', 'E', $content);
    Code (markup):
    And it works, but I don't have a list of all of these characters (some are weird like

    º (is meant to have a line under it) and Æ

    So I'm hoping there's some nifty bit of PHP to deal with this sort of stuff :)

    David
     
    SEO-Expert, Oct 27, 2010 IP
  2. S1M

    S1M Peon

    Messages:
    27
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    It's probably not your code, it's the character encoding in the database. Check that, you can set it in your my.cnf file (assuming mysql). I don't remeber exactly how but it's easy enough to google.
     
    S1M, Oct 27, 2010 IP
  3. SEO-Expert

    SEO-Expert Well-Known Member

    Messages:
    328
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    140
    #3
    I've kind of solved it using:

    $content = preg_replace("/[^\x9\xA\xD\x20-\x7F]/", "", $content);

    This removes anything that's not standard charachters (I think).

    It's not ideal as it deletes rather than replaces the charachters that are causing problems.

    Almost found an ideal solution:

    $transwpimc = get_html_translation_table(HTML_ENTITIES);
    $encodedwpimc = strtr($content, $transwpimc);
    $content = $encodedwpimc;

    This converts the charachters to the equivleent charachter code, but it also converts HTML tags as well, this is too good :).

    Looks like I'll have to create a bunch of replaces for each charachter, found a list of them all so shouldn't be as hard as I thought.

    David
     
    SEO-Expert, Oct 28, 2010 IP
  4. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #4
    Have a look at
    utf8_encode()
    PHP:
    Something like
    $content = utf8_encode($content);
    PHP:
    Might work out just fine, or you can run a function on your content, maybe this example from that page, consider $str as your $content variable.
     
    MyVodaFone, Oct 29, 2010 IP