Guys I'm trying to validate urls froma form, for example someone enters http://yahoo.com that would not be a valid url, I want to make sure they insert http://www.yahoo.com which I have found the regular expressions for that, but the problem is http://subdomain.yahoo.com does not work properly. So I need for them to be able to insert http://www.yahoo.com or http://subdomain.yahoo.com Reason I'm doing this is because I'm running a check in the database using parse_url to make sure there is no duplicate entries, so no 2 urls can be entered twice would you guys have any clue on how I can do this?
what regexp do you use ? Your Database can have a UNIQUE constraint on your url field and can assure you the unicity.
do you parse url for www? I mean if url has www it is true (1) and if url doesn`t have www , it is false (0) to avoid enetering a url without www? so because of that ur php program supposes that a subdomain url is incorrect? Am I right?
It is simple to solve. in parsing url "." is the key. pay attention: for example in parsing http://yahoo.com we have only one "." but in subdomains like http://messenger.yahoo.com we have two "." so the key is that if in parsing invalid URL you have only one "." it is like http://yahoo.com but if we have two or more "." like http:// messenger.yahoo.com it is a subdomain. but we have some exceptions like: http://yourdoamin.com.au it is a domain not a subdomain but has two "." but in many cases you can use the assumption which I mentioned.
Why not check for a subdomain and add www. if it is not there? $url = rtrim(preg_replace("~(.+?://)([^.]+\.[^.]+)/?~", '$1www.$2', $url), '/'); PHP: rtrim() is to ensure only one version of the URL is inserted (instead of two with and without the trailing slash)