I have been looking for the answer to this for about 2 days and it's becoming a nightmare. I need to let people enter an embed code from a specific website - Soundcloud. These look like this <object height="81" width="100%"> <param name="movie" value="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ffull-melt%2Fstrawberry-flava-mixtape&secret_url=false"></param> <param name="allowscriptaccess" value="always"></param> <embed allowscriptaccess="always" height="81" src="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ffull-melt%2Fstrawberry-flava-mixtape&secret_url=false" type="application/x-shockwave-flash" width="100%"></embed> </object> Code (markup): All I want to do is validate them so that when I let it go into a database I know that it is not going to cause any damage and is in fact an embed code I have this so far: '/<object height=\"([0-9]*)\" width=\"[0-9]*%)\"> <param name=\"movie\" value=\"(.*)\"><\/param> <param name=\"allowscriptaccess\" value=\"always\"><\/param> <embed allowscriptaccess=\"always\" height=\"[0-9]*\" src=\"(.*)\" type=\"application/x-shockwave-flash\" width=\"[0-9]*%)"><\/embed> <\/object>/' Code (markup): But it is not validating. First things I immediately think are problems are: width=\"[0-9]*%)\" I do not believe will match width="100%" type=\"application/x-shockwave-flash\" will not match type="application/x-shockwave-flash" and also, value=\"(.*)\" will not match value="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ffull-melt%2Fstrawberry-flava-mixtape&secret_url=false" (similiarly with src=\"(.*)\") But I just can;t figure out how they should be altered. Regex seems to be an ocean of nonsensical characters and rules and google throws up the most pointless anwers. If anyone could help me out here I would be eternally grateful.
I've messed arounnd with it and got it validating. This is what I have $test ='<object height="81" width="100%"> <param name="movie" value="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ftheshiverman%2Fsummer-beats-july-2010&secret_url=false"></param> <param name="allowscriptaccess" value="always"></param> <embed allowscriptaccess="always" height="81" src="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ftheshiverman%2Fsummer-beats-july-2010&secret_url=false" type="application/x-shockwave-flash" width="100"></embed> </object>'; if (preg_match('/<object height=\"([0-9]*)\" width=\"(.*)\"> <param name=\"movie\" value=\"(.*)\"><\/param> <param name=\"allowscriptaccess\" value=\"always\"><\/param> <embed allowscriptaccess=\"always\" height=\"[0-9]*\" src=\"(.*)\" type=\".*\" width=\".*"><\/embed> <\/object>/', $test,$preg_out)) { Code (markup): This works fine. However, when I use this to send the form value to the preg_match $test = $_POST['mixembedlink']; Code (markup): and input EXACTLY THE SAME STRING it doesn't work. This is driving me insane, can someone please help?! I've echo'd out the POST['mixembedlink'] field and it was adding backslashes before every double quote, so I used stripslashes and it worked...thanks for the help... Anyone fancy venturing an explanation as to why the form was doing this?
Your pattern is insecure as you're allowing any character (.) any amount of times (*). This seems to be working for me. $pattern = ' ~^<object height="(\d{2,3})" width="(\d{2,3})%">\s*' . '<param name="movie" value="(http://player\.soundcloud\.com/player\.swf\?url=http%3A%2F%2Fsoundcloud\.com%2F[\w-%]+&secret_url=(?:false|true))">\s*</param>\s*' . '<param name="allowscriptaccess" value="always">\s*</param>\s*' . '<embed allowscriptaccess="always" height="\1" src="\3" type="application/x-shockwave-flash" width="\2%">\s*</embed>\s*' . '</object>$~'; if (preg_match($pattern, trim($code))) { echo 'Valid'; } else { echo 'Invalid'; } PHP: trim() the user input before using it. EDIT: You're using .* in the source of the embedded file, which means users could insert their own flash files.
That looks far more secure, I will give it a try later on. For now tho, how come the width height and src values are different in the 2nd instance? i.e. first width is "(\d{2,3})%" and 2nd is "\2%" What does this do? src="\3" Appreciate the help
\1 stands for the first match (group) within the same pattern, \2 for the second, etc... This prevents incorrect code from being submitted. The width, height, and source should be the same in these locations.
That's very clever, I would have never discovered that. I will post back letting you know how it goes. Thanks a lot for your help
It won't validate. I have tried using the form field and just assigning the string straight to $code, using trim(), stripslashes() and just on its own and it still returns invalid. The beginning two characters, are they right? I thought that ^ meant to not allow something?
Are you sure theirs nothing else apart from the embed code within the $code and its all lowercase? ^ means the beggining/start of string
Ah, it was me having earlier removed a % symbol from the test example. It works great! Thank you enormously for your help!