Hi, I have built a CMS system for my websites and i have FCKeditor running to take in text from the user that they want to display on the site. What is the best function to use to sanitize the users input? Also am i best running the function before it goes into the database or when printing the data out? Cheers, Adam
Always sanitize before DB entry to prevent injection attacks. I am far from the expert here but I think mysql_escape_string, striptags, and html_enties will give you clean data. Of course those will also remove the Wysiwyg formating.
Here's an example of how I do this: $content = preg_replace('~<a [^>]+>(.*?)</a>~si', ' ', $content); $content = preg_replace('~<script [^>]+>(.*?)</script>~si', ' ', $content); $content = preg_replace('~<script>(.*?)</script>~si', ' ', $content); $content = preg_replace('~<style>(.*?)</style>~si', ' ', $content); $content = preg_replace('~<style [^>]+>(.*?)</style>~si', ' ', $content); PHP: That way I only remove links, scripts, and style adjustments, and can leave other markups that the wysiwyg editor puts in there.
Looks good - I'd put it into a function, though, to make it easier to reuse - as this: function sanitize_input($content) { $content = preg_replace('~<a [^>]+>(.*?)</a>~si', ' ', $content); $content = preg_replace('~<script [^>]+>(.*?)</script>~si', ' ', $content); $content = preg_replace('~<script>(.*?)</script>~si', ' ', $content); $content = preg_replace('~<style>(.*?)</style>~si', ' ', $content); $content = preg_replace('~<style [^>]+>(.*?)</style>~si', ' ', $content); return $content; } PHP: and then, for anything I'd like to check/sanitize, I'd just use $text = sanitize_input($_POST['text_from_form']); PHP: or something similar
That looks good, would it be useful to add anything to convert e.g. £ into £ and also anything to remove annoying /n tags.
You could do either of those things simply by changing the code around a little... For example: $content = preg_replace('£', '£', $content); PHP: That would change all the £ into '£' Hope that helps!
preg_* is way slower than string replace and these 4 replaces can become: $content = preg_replace('#<(?:a|script|style)[^>]*>(.*)</(?:a|script|style)>#si', ' ', $content); have fun with that although I don't this this replace will help you, lookafter DOMdocument or something that parses XML