Sanitize Wysiwyg Data Input

Discussion in 'PHP' started by adamjblakey, Apr 17, 2009.

  1. #1
    Hi,

    I have built a CMS system for my websites and i have FCKeditor running to take in text from the user that they want to display on the site.

    What is the best function to use to sanitize the users input?

    Also am i best running the function before it goes into the database or when printing the data out?

    Cheers,
    Adam
     
    adamjblakey, Apr 17, 2009 IP
  2. Colbyt

    Colbyt Notable Member

    Messages:
    3,224
    Likes Received:
    185
    Best Answers:
    0
    Trophy Points:
    210
    #2
    Always sanitize before DB entry to prevent injection attacks.

    I am far from the expert here but I think mysql_escape_string, striptags, and html_enties will give you clean data. Of course those will also remove the Wysiwyg formating.
     
    Colbyt, Apr 17, 2009 IP
  3. jorgy

    jorgy Peon

    Messages:
    611
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Here's an example of how I do this:

    	
    $content = preg_replace('~<a [^>]+>(.*?)</a>~si', ' ', $content);  
    $content = preg_replace('~<script [^>]+>(.*?)</script>~si', ' ', $content);
    $content = preg_replace('~<script>(.*?)</script>~si', ' ', $content);
    $content = preg_replace('~<style>(.*?)</style>~si', ' ', $content);
    $content = preg_replace('~<style [^>]+>(.*?)</style>~si', ' ', $content);
    
    PHP:
    That way I only remove links, scripts, and style adjustments, and can leave other markups that the wysiwyg editor puts in there.
     
    jorgy, Apr 18, 2009 IP
  4. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #4
    Looks good - I'd put it into a function, though, to make it easier to reuse - as this:

    
    function sanitize_input($content)
    {
    $content = preg_replace('~<a [^>]+>(.*?)</a>~si', ' ', $content);  
    $content = preg_replace('~<script [^>]+>(.*?)</script>~si', ' ', $content);
    $content = preg_replace('~<script>(.*?)</script>~si', ' ', $content);
    $content = preg_replace('~<style>(.*?)</style>~si', ' ', $content);
    $content = preg_replace('~<style [^>]+>(.*?)</style>~si', ' ', $content);
    return $content;
    }
    
    PHP:
    and then, for anything I'd like to check/sanitize, I'd just use
    
    $text = sanitize_input($_POST['text_from_form']);
    
    PHP:
    or something similar
     
    PoPSiCLe, Apr 18, 2009 IP
  5. adamjblakey

    adamjblakey Active Member

    Messages:
    1,121
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    80
    #5
    That looks good, would it be useful to add anything to convert e.g. £ into &pound; and also anything to remove annoying /n tags.
     
    adamjblakey, Apr 18, 2009 IP
  6. jorgy

    jorgy Peon

    Messages:
    611
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #6
    You could do either of those things simply by changing the code around a little... For example:

    $content = preg_replace('£', '&pound', $content);
    
    PHP:
    That would change all the £ into '&pound'

    Hope that helps!
     
    jorgy, Apr 18, 2009 IP
  7. nitsanbn

    nitsanbn Active Member

    Messages:
    382
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    58
    #7
    preg_* is way slower than string replace

    and these 4 replaces can become:
    $content = preg_replace('#<(?:a|script|style)[^>]*>(.*)</(?:a|script|style)>#si', ' ', $content);


    have fun with that :)
    although I don't this this replace will help you, lookafter DOMdocument or something that parses XML
     
    nitsanbn, Apr 18, 2009 IP