1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

PHP preg_replace adding http:// only when missing

Discussion in 'PHP' started by qwikad.com, Sep 12, 2013.

  1. #1
    Often people post links in our classifieds that start just with just www. or nothing at all (for instance www.blahblah.com or blahblah.com).

    I need a preg_replace function that will add http:// to any link that doesn't have it.

    So if a link is: www.blahblah.com it should become http://www.blahblah.com
    If a links is blahblah.com it should become http://blahblah.com

    Thank you for any suggestions.
    qwikad.com, Sep 12, 2013 IP
  2. EricBruggema

    EricBruggema Well-Known Member

    Messages:
    1,650
    Likes Received:
    22
    Best Answers:
    12
    Trophy Points:
    115
    #2
    this is fun, i had the same problem

    first check if the first 5 chars are http:
    If not? add http://
    If it is, astr_replace http:// to http:/ and then again to http:// :) that's all.. (why replace? if users wrote http:/ it would give problems, as i have experienced many times)

    So easy...
    EricBruggema, Sep 12, 2013 IP
  3. Basti

    Basti Active Member

    Messages:
    627
    Likes Received:
    6
    Best Answers:
    3
    Trophy Points:
    90
    #3
    If it not start with http:// i personally would spit out an error if its a form
    PHP:
    1.     if (!preg_match('/^http:\/\/.+/', $_POST['YOUR_INPUT_NAME'])) {
    2.       echo 'must start with http';
    3.     }
    but you can also simply generate it by replacing the echo with the following
    PHP:
    1.  
    2. $_POST['YOUR_INPUT_NAME'] = 'http://'.$_POST['YOUR_INPUT_NAME'];
    3.  
    Your issue indicates a deeper issue though, looks like your script has no validate url function at all. And you should inspect if further input fields also have issues
    Last edited: Sep 12, 2013
    Basti, Sep 12, 2013 IP
  4. qwikad.com

    qwikad.com Well-Known Member Affiliate Manager

    Messages:
    1,927
    Likes Received:
    238
    Best Answers:
    1
    Trophy Points:
    140
    #4
    @EricBruggema can you be more specific using the line(s) below:

    $text = preg_replace( '//', "", $text);

    or

    $text = str_replace( '//', "", $text);

    Thanks!
    qwikad.com, Sep 12, 2013 IP
  5. EricBruggema

    EricBruggema Well-Known Member

    Messages:
    1,650
    Likes Received:
    22
    Best Answers:
    12
    Trophy Points:
    115
    #5
    don't ever trust // only

    no preg replace if its only ONE line, use str_replace

    PHP:
    1.  
    2. $text = (substr($text, 0, 5) != "http:") ? 'http://' . $text : str_replace("http:/", "http://", str_replace("http://", "http:/", $text)));
    3.  
    EricBruggema, Sep 12, 2013 IP
  6. Basti

    Basti Active Member

    Messages:
    627
    Likes Received:
    6
    Best Answers:
    3
    Trophy Points:
    90
    #6
    Wow you make it overly complicated. substr and then 2 str_replace. 1 preg_match check does exactly the same. and you simply append http to your var then
    Basti, Sep 12, 2013 IP
  7. qwikad.com

    qwikad.com Well-Known Member Affiliate Manager

    Messages:
    1,927
    Likes Received:
    238
    Best Answers:
    1
    Trophy Points:
    140
    #7
    Looking at the code I am wondering how it will detect that it is a URL? Nothing there says that it's .com .org, etc. Will it be adding http:// to everything?

    I guess I need something that will go like this:

    if it it's a URL and it's missing http:// add http:// to it

    I realize it sounds simple but can be complicated to solve. I searched it up and down online trying to find a solution.
    Last edited: Sep 12, 2013
    qwikad.com, Sep 12, 2013 IP
  8. Basti

    Basti Active Member

    Messages:
    627
    Likes Received:
    6
    Best Answers:
    3
    Trophy Points:
    90
    #8
    qwi, the thing is your backend queries your database for a specific column they used during signup, posting an ad or whatever. That generate variable you have to use in the code.
    Or are you implying you want to replace links in a block of text? Thats almost impossible. Script wouldnt know if we talk about a domain, or end of sentence. Then the next sentence normally starts with a white space, but you cant rely on that.

    Both our methods will work, but they require that the url from whereever it comes from is standalone and not mixed with other text
    Basti, Sep 12, 2013 IP
  9. EricBruggema

    EricBruggema Well-Known Member

    Messages:
    1,650
    Likes Received:
    22
    Best Answers:
    12
    Trophy Points:
    115
    #9
    True, but i'm expecting the worst case senario and my line will fix all of it (using it for a webcrawler) oh and i'm expecting that the input is from a field special for adding a url or some sort.
    EricBruggema, Sep 12, 2013 IP
    Basti likes this.
  10. Basti

    Basti Active Member

    Messages:
    627
    Likes Received:
    6
    Best Answers:
    3
    Trophy Points:
    90
    #10
    Oh i see, yea i wasnt expecting this for already submitted urls, which can ofcourse may only include one slash. simple preg match wouldnt work for that, sorry.
    Further url errors should be prevented beforehand on the form though, so just let them not submit without http
    Basti, Sep 12, 2013 IP
  11. qwikad.com

    qwikad.com Well-Known Member Affiliate Manager

    Messages:
    1,927
    Likes Received:
    238
    Best Answers:
    1
    Trophy Points:
    140
    #11
    Guys, I don't think we're working in the same direction. Someone suggested this solution. What do you think?

    Code (Text):
    1. $text = preg_replace( '/^(?:http:\/\/)?(.*)/', "http://$1", $text);
    Can you confirm that this is a viable solution?
    qwikad.com, Sep 12, 2013 IP
  12. EmmanuelFlossie

    EmmanuelFlossie Active Member

    Messages:
    159
    Likes Received:
    11
    Best Answers:
    2
    Trophy Points:
    65
    #12
    This will always return the correct url if you need http infront just add it to the last line where it says www. to http : //www.
    Code (Text):
    1.  
    2. <?php
    3. $url = 'test.com';
    4. if (!preg_match("/http:/",$url)){
    5.     $url = 'http://'.$url;
    6.     }
    7. $domainHost = parse_url($url);
    8. $domainHost['host'] = str_replace("http://","",$domainHost['host']);
    9. $domainHost['host'] = str_replace("www.","",$domainHost['host']);
    10. echo $domainHost['host'] = 'www.'.$domainHost['host'];
    11.  
    12. ?>
    13.  
    EmmanuelFlossie, Sep 12, 2013 IP
  13. deathshadow

    deathshadow Prominent Member

    Messages:
    5,980
    Likes Received:
    827
    Best Answers:
    144
    Trophy Points:
    395
    #13
    I think Eric is on the right track, but is checking for the wrong thing. I'd test for the presence of the schema delimiter instead of the full schema that way you can pass more than just http. I'd also worry about the possibility of false positives in the query, so I'd split the query off for processing. While at it maybe throw a windows slash fixer on it too.

    Code (Text):
    1. function fixedURL($URL,$scheme = 'http') {
    2.     // split off query if present
    3.     $u = explode('?',$URL);
    4.     // fix windows slashes if present, trim off leading slashes
    5.     $u[0] = ltrim(str_replace('\\', '/', $u[0]), '/');
    6.     // check for scheme, add if missing
    7.     if (strpos($u[0], '://') === false) $u[0] = $scheme . '://' . $u[0];
    8.     // recombine result.
    9.     return implode('?',$u);
    10. }
    A bit more complex, but handles more situations. I might also consider running it through parse_url and recombining after that, but that seems unnecessary... though if you wanted to check the scheme to allow only specific ones, that's where I'd handle that.

    Will work with ALL schemes, passing FTP, HTTPS, MAIL, NN, WS, etc, etc... unlike the regex versions, only adding http if none are present. Likewise putting it in a function lets you pass a different default scheme if desired.
    deathshadow, Sep 12, 2013 IP
  14. deathshadow

    deathshadow Prominent Member

    Messages:
    5,980
    Likes Received:
    827
    Best Answers:
    144
    Trophy Points:
    395
    #14
    Oh, and @webStumbler, yours is VERY broken since if you enter a subdomain like www, your code will return www TWICE since HOST WILL retain the subdomain on ['host'] and/or PHP_URL_HOST.

    Which BTW, if you're not going to be using anything but ['host'] you should be using instead thus:
    $domainHost = parse_url($url, PHP_URL_HOST);

    That way you don't have to array index it.
    deathshadow, Sep 12, 2013 IP
  15. EricBruggema

    EricBruggema Well-Known Member

    Messages:
    1,650
    Likes Received:
    22
    Best Answers:
    12
    Trophy Points:
    115
    #15
    @deathshadow; i was thinking of a easy and quick hack... just to add http:// and not for checking if the rest is correct, your solution is way better ;)
    EricBruggema, Sep 12, 2013 IP
  16. EmmanuelFlossie

    EmmanuelFlossie Active Member

    Messages:
    159
    Likes Received:
    11
    Best Answers:
    2
    Trophy Points:
    65
    #16
    I initially created this script to only check for http and add it if it's missing and always remove www so that i can make sure that my database has unique input. Than after that add the www. if the url can not be found.
    It would in matter of fact not return twice www
    But actually with what you say it is indeed broken if you have a subdomain other than www.
    Because than it would be doing www. subdomain.domain.com

    Thanks for highlighting this :)
    EmmanuelFlossie, Sep 12, 2013 IP
  17. qwikad.com

    qwikad.com Well-Known Member Affiliate Manager

    Messages:
    1,927
    Likes Received:
    238
    Best Answers:
    1
    Trophy Points:
    140
    #17
    Thank you for all your suggestions. I finally found something that's simple yet it's working just fine. It adds http only to the URLs that start with www. In my case, it's like 85-90% of the URLs that posters post forgetting to add http. Granted some of them forget to even include www. but I am not worried about that as much since it's a very small number of people.

    This is the line I am using:

    Code (Text):
    1. $text = preg_replace('/([^\w\/])(www\.[a-z0-9\-]+\.[a-z0-9\-]+)/i', "$1http://$2", $text);
    Again, thank you everyone.
    qwikad.com, Sep 13, 2013 IP