Regex help for embed code

Discussion in 'PHP' started by Mykmallett, Aug 17, 2010.

  1. #1
    I have been looking for the answer to this for about 2 days and it's becoming a nightmare. I need to let people enter an embed code from a specific website - Soundcloud. These look like this
    
    <object height="81" width="100%"> <param name="movie" value="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ffull-melt%2Fstrawberry-flava-mixtape&secret_url=false"></param> <param name="allowscriptaccess" value="always"></param> <embed allowscriptaccess="always" height="81" src="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ffull-melt%2Fstrawberry-flava-mixtape&secret_url=false" type="application/x-shockwave-flash" width="100%"></embed> </object>
    Code (markup):
    All I want to do is validate them so that when I let it go into a database I know that it is not going to cause any damage and is in fact an embed code

    I have this so far:

    
    '/<object height=\"([0-9]*)\" width=\"[0-9]*%)\"> <param name=\"movie\" value=\"(.*)\"><\/param> <param name=\"allowscriptaccess\" value=\"always\"><\/param> <embed allowscriptaccess=\"always\" height=\"[0-9]*\" src=\"(.*)\" type=\"application/x-shockwave-flash\" width=\"[0-9]*%)"><\/embed> <\/object>/'
    Code (markup):
    But it is not validating.

    First things I immediately think are problems are:

    width=\"[0-9]*%)\" I do not believe will match width="100%"
    type=\"application/x-shockwave-flash\" will not match type="application/x-shockwave-flash"

    and also,
    value=\"(.*)\" will not match value="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ffull-melt%2Fstrawberry-flava-mixtape&secret_url=false" (similiarly with src=\"(.*)\")


    But I just can;t figure out how they should be altered. Regex seems to be an ocean of nonsensical characters and rules and google throws up the most pointless anwers.

    If anyone could help me out here I would be eternally grateful.
     
    Mykmallett, Aug 17, 2010 IP
  2. Mykmallett

    Mykmallett Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I've messed arounnd with it and got it validating. This is what I have

    
    		$test ='<object height="81" width="100%"> <param name="movie" value="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ftheshiverman%2Fsummer-beats-july-2010&secret_url=false"></param> <param name="allowscriptaccess" value="always"></param> <embed allowscriptaccess="always" height="81" src="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ftheshiverman%2Fsummer-beats-july-2010&secret_url=false" type="application/x-shockwave-flash" width="100"></embed> </object>'; 
    		if (preg_match('/<object height=\"([0-9]*)\" width=\"(.*)\"> <param name=\"movie\" value=\"(.*)\"><\/param> <param name=\"allowscriptaccess\" value=\"always\"><\/param> <embed allowscriptaccess=\"always\" height=\"[0-9]*\" src=\"(.*)\" type=\".*\" width=\".*"><\/embed> <\/object>/', $test,$preg_out)) {
    		
    Code (markup):
    This works fine.

    However, when I use this to send the form value to the preg_match

    $test = $_POST['mixembedlink'];
    Code (markup):
    and input EXACTLY THE SAME STRING it doesn't work. This is driving me insane, can someone please help?!



    I've echo'd out the POST['mixembedlink'] field and it was adding backslashes before every double quote, so I used stripslashes and it worked...thanks for the help...

    Anyone fancy venturing an explanation as to why the form was doing this?
     
    Last edited: Aug 17, 2010
    Mykmallett, Aug 17, 2010 IP
  3. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #3
    Your pattern is insecure as you're allowing any character (.) any amount of times (*).

    This seems to be working for me.
    
    
    $pattern = '
    	~^<object height="(\d{2,3})" width="(\d{2,3})%">\s*' . 
    		'<param name="movie" value="(http://player\.soundcloud\.com/player\.swf\?url=http%3A%2F%2Fsoundcloud\.com%2F[\w-%]+&secret_url=(?:false|true))">\s*</param>\s*' .
    		'<param name="allowscriptaccess" value="always">\s*</param>\s*' .
    		'<embed allowscriptaccess="always" height="\1" src="\3" type="application/x-shockwave-flash" width="\2%">\s*</embed>\s*' .
    	'</object>$~';
    
    if (preg_match($pattern, trim($code)))
    {
    	echo 'Valid';
    }
    else
    {
    	echo 'Invalid';
    }
    
    PHP:
    trim() the user input before using it.

    EDIT:
    You're using .* in the source of the embedded file, which means users could insert their own flash files.
     
    Last edited: Aug 17, 2010
    nico_swd, Aug 17, 2010 IP
  4. Mykmallett

    Mykmallett Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    That looks far more secure, I will give it a try later on.

    For now tho, how come the width height and src values are different in the 2nd instance?

    i.e. first width is "(\d{2,3})%" and 2nd is "\2%"

    What does this do? src="\3"


    Appreciate the help
     
    Mykmallett, Aug 17, 2010 IP
  5. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #5
    \1 stands for the first match (group) within the same pattern, \2 for the second, etc...

    This prevents incorrect code from being submitted. The width, height, and source should be the same in these locations.
     
    nico_swd, Aug 17, 2010 IP
  6. Mykmallett

    Mykmallett Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    That's very clever, I would have never discovered that. I will post back letting you know how it goes. Thanks a lot for your help
     
    Mykmallett, Aug 17, 2010 IP
  7. Mykmallett

    Mykmallett Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    It won't validate. I have tried using the form field and just assigning the string straight to $code, using trim(), stripslashes() and just on its own and it still returns invalid.


    The beginning two characters, are they right? I thought that ^ meant to not allow something?
     
    Mykmallett, Aug 17, 2010 IP
  8. danx10

    danx10 Peon

    Messages:
    1,179
    Likes Received:
    44
    Best Answers:
    2
    Trophy Points:
    0
    #8
    Are you sure theirs nothing else apart from the embed code within the $code and its all lowercase?

    ^ means the beggining/start of string
     
    danx10, Aug 17, 2010 IP
  9. Mykmallett

    Mykmallett Peon

    Messages:
    6
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Ah, it was me having earlier removed a % symbol from the test example. It works great! Thank you enormously for your help!
     
    Mykmallett, Aug 17, 2010 IP
  10. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #10
    You're welcome! :)
     
    nico_swd, Aug 18, 2010 IP