Often people post links in our classifieds that start just with just www. or nothing at all (for instance www.blahblah.com or blahblah.com). I need a preg_replace function that will add http:// to any link that doesn't have it. So if a link is: www.blahblah.com it should become http://www.blahblah.com If a links is blahblah.com it should become http://blahblah.com Thank you for any suggestions.
this is fun, i had the same problem first check if the first 5 chars are http: If not? add http:// If it is, astr_replace http:// to http:/ and then again to http:// that's all.. (why replace? if users wrote http:/ it would give problems, as i have experienced many times) So easy...
If it not start with http:// i personally would spit out an error if its a form if (!preg_match('/^http:\/\/.+/', $_POST['YOUR_INPUT_NAME'])) { echo 'must start with http'; } PHP: but you can also simply generate it by replacing the echo with the following $_POST['YOUR_INPUT_NAME'] = 'http://'.$_POST['YOUR_INPUT_NAME']; PHP: Your issue indicates a deeper issue though, looks like your script has no validate url function at all. And you should inspect if further input fields also have issues
@EricBruggema can you be more specific using the line(s) below: $text = preg_replace( '//', "", $text); or $text = str_replace( '//', "", $text); Thanks!
don't ever trust // only no preg replace if its only ONE line, use str_replace $text = (substr($text, 0, 5) != "http:") ? 'http://' . $text : str_replace("http:/", "http://", str_replace("http://", "http:/", $text))); PHP:
Wow you make it overly complicated. substr and then 2 str_replace. 1 preg_match check does exactly the same. and you simply append http to your var then
Looking at the code I am wondering how it will detect that it is a URL? Nothing there says that it's .com .org, etc. Will it be adding http:// to everything? I guess I need something that will go like this: if it it's a URL and it's missing http:// add http:// to it I realize it sounds simple but can be complicated to solve. I searched it up and down online trying to find a solution.
qwi, the thing is your backend queries your database for a specific column they used during signup, posting an ad or whatever. That generate variable you have to use in the code. Or are you implying you want to replace links in a block of text? Thats almost impossible. Script wouldnt know if we talk about a domain, or end of sentence. Then the next sentence normally starts with a white space, but you cant rely on that. Both our methods will work, but they require that the url from whereever it comes from is standalone and not mixed with other text
True, but i'm expecting the worst case senario and my line will fix all of it (using it for a webcrawler) oh and i'm expecting that the input is from a field special for adding a url or some sort.
Oh i see, yea i wasnt expecting this for already submitted urls, which can ofcourse may only include one slash. simple preg match wouldnt work for that, sorry. Further url errors should be prevented beforehand on the form though, so just let them not submit without http
Guys, I don't think we're working in the same direction. Someone suggested this solution. What do you think? $text = preg_replace( '/^(?:http:\/\/)?(.*)/', "http://$1", $text); Code (markup): Can you confirm that this is a viable solution?
This will always return the correct url if you need http infront just add it to the last line where it says www. to http : //www. <?php $url = 'test.com'; if (!preg_match("/http:/",$url)){ $url = 'http://'.$url; } $domainHost = parse_url($url); $domainHost['host'] = str_replace("http://","",$domainHost['host']); $domainHost['host'] = str_replace("www.","",$domainHost['host']); echo $domainHost['host'] = 'www.'.$domainHost['host']; ?> Code (markup):
I think Eric is on the right track, but is checking for the wrong thing. I'd test for the presence of the schema delimiter instead of the full schema that way you can pass more than just http. I'd also worry about the possibility of false positives in the query, so I'd split the query off for processing. While at it maybe throw a windows slash fixer on it too. function fixedURL($URL,$scheme = 'http') { // split off query if present $u = explode('?',$URL); // fix windows slashes if present, trim off leading slashes $u[0] = ltrim(str_replace('\\', '/', $u[0]), '/'); // check for scheme, add if missing if (strpos($u[0], '://') === false) $u[0] = $scheme . '://' . $u[0]; // recombine result. return implode('?',$u); } Code (markup): A bit more complex, but handles more situations. I might also consider running it through parse_url and recombining after that, but that seems unnecessary... though if you wanted to check the scheme to allow only specific ones, that's where I'd handle that. Will work with ALL schemes, passing FTP, HTTPS, MAIL, NN, WS, etc, etc... unlike the regex versions, only adding http if none are present. Likewise putting it in a function lets you pass a different default scheme if desired.
Oh, and @webStumbler, yours is VERY broken since if you enter a subdomain like www, your code will return www TWICE since HOST WILL retain the subdomain on ['host'] and/or PHP_URL_HOST. Which BTW, if you're not going to be using anything but ['host'] you should be using instead thus: $domainHost = parse_url($url, PHP_URL_HOST); That way you don't have to array index it.
@deathshadow; i was thinking of a easy and quick hack... just to add http:// and not for checking if the rest is correct, your solution is way better
I initially created this script to only check for http and add it if it's missing and always remove www so that i can make sure that my database has unique input. Than after that add the www. if the url can not be found. It would in matter of fact not return twice www But actually with what you say it is indeed broken if you have a subdomain other than www. Because than it would be doing www. subdomain.domain.com Thanks for highlighting this
Thank you for all your suggestions. I finally found something that's simple yet it's working just fine. It adds http only to the URLs that start with www. In my case, it's like 85-90% of the URLs that posters post forgetting to add http. Granted some of them forget to even include www. but I am not worried about that as much since it's a very small number of people. This is the line I am using: $text = preg_replace('/([^\w\/])(www\.[a-z0-9\-]+\.[a-z0-9\-]+)/i', "$1http://$2", $text); Code (markup): Again, thank you everyone.