preg replace for URLs

Discussion in 'PHP' started by Weirfire, Nov 18, 2006.

  1. #1
    I'm looking to use some rewritten URLs which gives the title of a page. Unfortunately this can use a lot of symbols and accented letters due to the site being in a foreign language.

    How would I use preg replace to only allow alpha-numeric values and convert accents such as Á, á, É, é, Í, í, Ó, ó, Ú, ú, Ñ, ñ into their alphabetical equivalent?

    I've always been a bit iffy with preg replace and normally bow out and just write a few str_replace lines of code but I feel now is the time to stop being lazy and try and find out how to use the more complex preg replace function.
     
    Weirfire, Nov 18, 2006 IP
  2. thedark

    thedark Well-Known Member

    Messages:
    1,346
    Likes Received:
    43
    Best Answers:
    0
    Trophy Points:
    168
    Digital Goods:
    1
    #2
    thedark, Nov 18, 2006 IP
  3. Weirfire

    Weirfire Language Translation Company

    Messages:
    6,979
    Likes Received:
    365
    Best Answers:
    0
    Trophy Points:
    280
    #3
    Lets say the string was

    $title = "¿El éxito?";

    How would I turn this into

    $title = "el-exito";

    with the fewest possible lines of code?
     
    Weirfire, Nov 18, 2006 IP
  4. legend2

    legend2 Well-Known Member

    Messages:
    1,537
    Likes Received:
    74
    Best Answers:
    0
    Trophy Points:
    115
    #4
    Loop over each character in the title and check if it is one of the following
    ctype_alnum()
    ctype_punct()
    ctype_space()
    ctype_xdigit()
    ctype_cntrl()
    if none if these returns true, then delete that character.
     
    legend2, Nov 19, 2006 IP
  5. Weirfire

    Weirfire Language Translation Company

    Messages:
    6,979
    Likes Received:
    365
    Best Answers:
    0
    Trophy Points:
    280
    #5
    Interesting method to use legend. Isn't there an easier way to do it with preg_replace though? I'm almost certain there's a way of removing all symbols from a string using preg_replace.

    It uses something like

    [a-zA-Z0-9] in the same sort of format as you get with htaccess.
     
    Weirfire, Nov 20, 2006 IP
  6. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #6
    nico_swd, Nov 20, 2006 IP
  7. Weirfire

    Weirfire Language Translation Company

    Messages:
    6,979
    Likes Received:
    365
    Best Answers:
    0
    Trophy Points:
    280
    #7
    $title = ereg_replace("[^ -_0-9a-z]","",$title);

    Apparently this piece of code removes all characters which are non-alphanumeric and are not lower case and do not contain symbols - and _

    Just add A-Z after a-z if you want to allow upper case characters.
     
    Weirfire, Nov 20, 2006 IP
  8. vishwaa

    vishwaa Well-Known Member

    Messages:
    271
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    138
    #8
    vishwaa, Nov 20, 2006 IP
    Weirfire likes this.