Extracting from an url

Discussion in 'PHP' started by Luke Jones, Jul 22, 2007.

  1. #1
    Hello,
    If I want to extract football from this url:
    mysite.com/news/football.html
    how do I do it?
    Is there a simple GET command I can use?

    Thanks,

    Luke
     
    Luke Jones, Jul 22, 2007 IP
  2. Chemo

    Chemo Peon

    Messages:
    146
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #2
    pathinfo(), basename(), or a few others. Learn to use the PHP manual to find the answers.
     
    Chemo, Jul 22, 2007 IP
  3. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #3
    I would probably use a regular expression assuming that the directory is always in the same location in the url. Use a preg_replace expression and you can strip everything except what is between the / and the .html.
     
    jestep, Jul 22, 2007 IP
  4. ds316

    ds316 Peon

    Messages:
    154
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #4
    preg_replace("|/(\\w+)\\.\\w+$|",$url,$regs);

    $regs[1] would containt "football" assuming $url was "mysite.com/news/football.html"
     
    ds316, Jul 22, 2007 IP
  5. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #5
    
    $foo = basename($url, '.html');
    
    PHP:
     
    nico_swd, Jul 23, 2007 IP
  6. krt

    krt Well-Known Member

    Messages:
    829
    Likes Received:
    38
    Best Answers:
    0
    Trophy Points:
    120
    #6
    ds316, what if there is a hyphen, e.g. table-tennis, your regex fails. Also, you should be using preg_match(), not preg_replace()

    nico's one should be fine, I assume all the pages will have .html, however if they don't, you might want to use regex... or what he mentions below :p
     
    krt, Jul 23, 2007 IP
  7. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #7
    
    $foo = basename($url, '.' . end(explode('.', $url)));
    
    PHP:
     
    nico_swd, Jul 23, 2007 IP
  8. ds316

    ds316 Peon

    Messages:
    154
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I'm too tired, i shouldnt be coding. :p

    I did write it in a hurry, but yes you are entirely correct, the better implementation would be:

    preg_match("|^[^\\?]+/([^\\./]+)\\.|",$url,$regs);

    Again $regs[1] conatins your name.

    That covers all bases, even if there was a query string with an unencoded url such as:

    http://mysite.com/news/football.php?refurl=http://google.com/search?q=football

    Still, I think basename covers it in that situation as well, its up to you which method to use.
     
    ds316, Jul 23, 2007 IP
  9. Luke Jones

    Luke Jones Peon

    Messages:
    427
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Thanks very much for your answers.
    Something crazy is happening on my site. The following will not print:
    $url = urlencode($_GET['url']);
    $foo = basename($url, '.' . end(explode('.', $url)));
    print $foo;
    If you have any ideas why this is not working, please let me know.

    Thanks
     
    Luke Jones, Jul 23, 2007 IP
  10. Luke Jones

    Luke Jones Peon

    Messages:
    427
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Ok, I've resolved it using $url = ($_SERVER['REQUEST_URI']);
    It's now working.
    The only thing is that it prints the %20 if there are spaces in that part of the url. Is there any way to take this out?

    Thanks for your help anyway.

    Luke
     
    Luke Jones, Jul 23, 2007 IP
  11. ds316

    ds316 Peon

    Messages:
    154
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #11
    use urldecode() on the string
     
    ds316, Jul 23, 2007 IP
  12. Luke Jones

    Luke Jones Peon

    Messages:
    427
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Yeah, I used it, like this: $url = urlencode($_SERVER['REQUEST_URI']);
    It didn't work.
     
    Luke Jones, Jul 23, 2007 IP
  13. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #13
    urldecode, not urlencode.
     
    nico_swd, Jul 23, 2007 IP
  14. Luke Jones

    Luke Jones Peon

    Messages:
    427
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Ah!
    I'll go and try it now...
     
    Luke Jones, Jul 23, 2007 IP
  15. Luke Jones

    Luke Jones Peon

    Messages:
    427
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Can I do exactly the same thing using HTML, and not Php?
     
    Luke Jones, Jul 24, 2007 IP
  16. lbalance

    lbalance Peon

    Messages:
    381
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #16

    no. HTML is not a real programming language, so there is no way it can parse a URL.
     
    lbalance, Jul 24, 2007 IP
  17. Luke Jones

    Luke Jones Peon

    Messages:
    427
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Nico,
    I'm using the code you gave me to do a redirect:
    $url = ($_GET['url']);
    $new_url = rawurldecode(rawurldecode(end(explode('*', $url))));
    header('Refresh: 10; url=http://linkanon.com/?r=$new_url');

    Do you know why it won't work.
    It redirects, but to http://linkanon.com/?r=$new_url. It's not replacing the $new_url.
     
    Luke Jones, Jul 27, 2007 IP
  18. jestep

    jestep Prominent Member

    Messages:
    3,659
    Likes Received:
    215
    Best Answers:
    19
    Trophy Points:
    330
    #18
    Since you are using ' instead of " it won't process the code properly. Try this:

    
    $url = ($_GET['url']);
    $new_url = rawurldecode(rawurldecode(end(explode('*', $url))));
    header('Location: http://linkanon.com/?r='.$new_url.''); 
    
    
    PHP:
    Header("Refresh is an obsolete method of redirecting. It may work but is most likely going to stop working at some point.

    I suggest doing the redirect as I wrote above, or using a <META HTTP-EQUIV="Refresh" if you want the delay.

    That would look like:
    
    $url = ($_GET['url']);
    $new_url = rawurldecode(rawurldecode(end(explode('*', $url))));
    
    echo '<meta http-equiv="refresh" content="10; url=http://linkanon.com/?r='.$new_url.'" />';
    
    
    PHP:
    You would need to echo somewhere after <head> and before </head>
     
    jestep, Jul 27, 2007 IP
  19. Luke Jones

    Luke Jones Peon

    Messages:
    427
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Thanks for your answer.
    Am I right that search engine spiders follow both these two types of redirects - meta and header location?
    Is so, Javascript will be better for me in this case. I want the spider to index the contents of my page, but I want the visitor to be immediately redirected.
     
    Luke Jones, Jul 27, 2007 IP