Strip html, php, javascript, anything not rendered as text in browser

Discussion in 'PHP' started by batman4444, Nov 13, 2008.

  1. #1
    Any one know an easy ways to strip text off all html, javascript, php, and anything else that is not rendered as plain text in a browser window?

    I am aware of the strip_tags () function but it still leaves in all sorts of stuff that is not rendered as text in a browser

    THanks
     
    batman4444, Nov 13, 2008 IP
  2. logondotinfo

    logondotinfo Peon

    Messages:
    314
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    0
    #2
    
    function strip_html_tags( $text )
    {
        $text = preg_replace(
            array(
              // Remove invisible content
                '@<head[^>]*?>.*?</head>@siu',
                '@<style[^>]*?>.*?</style>@siu',
                '@<script[^>]*?.*?</script>@siu',
                '@<object[^>]*?.*?</object>@siu',
                '@<embed[^>]*?.*?</embed>@siu',
                '@<applet[^>]*?.*?</applet>@siu',
                '@<noframes[^>]*?.*?</noframes>@siu',
                '@<noscript[^>]*?.*?</noscript>@siu',
                '@<noembed[^>]*?.*?</noembed>@siu',
              // Add line breaks before and after blocks
                '@</?((address)|(blockquote)|(center)|(del))@iu',
                '@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
                '@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
                '@</?((table)|(th)|(td)|(caption))@iu',
                '@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
                '@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
                '@</?((frameset)|(frame)|(iframe))@iu',
            ),
            array(
                ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
                "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0",
                "\n\$0", "\n\$0",
            ),
            $text );
        return strip_tags( $text );
    }
    Code (markup):
    that should take out most html tags. You shouldnt need to take out any php tags as the output file shouldn't have any php in it anymore, it should be static content.
     
    logondotinfo, Nov 13, 2008 IP
  3. firemarsh

    firemarsh Peon

    Messages:
    153
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Why not just use this regex? "<[^>]*>"
     
    firemarsh, Nov 13, 2008 IP
  4. javaongsan

    javaongsan Well-Known Member

    Messages:
    1,054
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    128
    #4
    htmlentities + strip_tags should do it
     
    javaongsan, Nov 13, 2008 IP