How to avoid all non english characters?

Discussion in 'PHP' started by baris22, Dec 27, 2008.

  1. #1
    Hello all,

    I am using curl function to get some pages on the web.

    I want to ignore the characters which is not in english. I want to get only english characters and all the punctuation marks.

    how can i do this with str replace?

    thanks
     
    baris22, Dec 27, 2008 IP
  2. Danltn

    Danltn Well-Known Member

    Messages:
    679
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    120
    #2
    function remove_unwanted( $var )
    {
        static $allowed_chars = '!"#$%&\'()*+,-.\\/0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~';
        return preg_replace( '/[^' . $allowed_chars . ']/', '', $var );
    }
    PHP:
     
    Danltn, Dec 27, 2008 IP
  3. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #3
    Hello,

    this worked kind of. but

    There is no space between words. All the words came together and there is no line breaks as well.

    thanks
     
    baris22, Dec 27, 2008 IP
  4. Danltn

    Danltn Well-Known Member

    Messages:
    679
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    120
    #4
    Add spaces and line breaks into the allowed chars list then...
     
    Danltn, Dec 27, 2008 IP
  5. baris22

    baris22 Active Member

    Messages:
    543
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #5
    I do not know how to add <br> to this code.
     
    baris22, Dec 27, 2008 IP
  6. Danltn

    Danltn Well-Known Member

    Messages:
    679
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    120
    #6
    *sighs*
    <?php
    
    function remove_unwanted( $var )
    {
        static $allowed_chars = "!\"#$%&'()*+,-.\\\\\\/0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\\[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~ \n";
        return preg_replace( '/[^' . $allowed_chars . ']/', '', $var );
    }
    
    echo remove_unwanted('New<br />[]éépost' . "\n" . '<br>asdaéá');
    // Example
    PHP:
    Returns:
    New<br />[]post
    <br>asda
    Code (markup):
     
    Danltn, Dec 27, 2008 IP