Help with regex regular expressions on <img> alt= & scr= tags

Discussion in 'PHP' started by sv800, Jul 28, 2007.

  1. #1
    Hi folks, need some advice to match and extract images <img> alt= and src= tags.

    The source text is this

    First approach is extracting the scr= portion that is working fine:

    
         preg_match_all("/\< *[img][^\>]*src *= *[\"\']{0,1}([^\"\'\ >]*)/", $html, $matches);
    
    
    PHP:
    Question, I also need the matching alt= tag

    I tried the following that didn't the job

     
    sv800, Jul 28, 2007 IP
  2. sv800

    sv800 Active Member

    Messages:
    148
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    60
    #2
    I did some advance with the previous code however still need to improve the code.

    Here is again the text


    <p align="center"><a title="Gato con carrito de compra de supermercado invisible" class="imagelink" rel="attachment" id="p203" href="http://www.ecnc.com/blog/2007/07/23/voy-al-supermercado-me-compras-unas-toallas-femeninas-mi-amor/gato-con-carrito-de-compra-de-supermercado-invisible/">
    <img alt="Gato con carrito de compra de supermercado invisible" id="image203" src="http://www.lecnc.com/blog/wp-content/uploads/2007/07/carrito_de_compra_invisible.jpg" /></a></p>
    <p>Fuente: <a target="_blank" title="Funny animals" href="http://www.flickr.com/photos/funny_animals/380169474/">Flickr Funny Animals </a></p>
    
    Code (markup):
    I'm trying to match alt= and scr=

    Regular expression (regex) code I'm using

    
    /\< *[img][^\>]*alt *= *[\"\']{0,1}([^>]*) *src *= *[\"\']{0,1}([^\"\'\ >]*)/
    
    Code (markup):
    It's matching fine, however match 1 is matching more that I want, the alt= is getting

    Gato con carrito de compra de supermercado invisible" id="image203"
    Code (markup):
    Now I need to take out the id="image203" from the selection


    Any suggestion is well appreciated?
     
    sv800, Jul 29, 2007 IP
  3. themole

    themole Peon

    Messages:
    82
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #3
    First i would try your last expression, but put a capital 'U' after your regex delimiter:

    /\< *[^\>]*alt *= *[\"\']{0,1}([^>]*) *src *= *[\"\']{0,1}([^\"\'\ >]*)/[b]U[/b]

    This tells it to be ungreedy, so it might not get too much data then.


    However, since your img tag attributes are not always in the same order, I would do this in a couple steps instead of all at once. It is a little slower, but it will be more reliable and easier to maintain in the future.

    First I would match all the image tags, loop through the results and match the different fields:


    [CODE]
    preg_match_all("#<img (.*) [ /]{0,1}>#Ui", $html, $matches_images);

    foreach($matches_images[1] as $id => $attributes)
    {
    preg_match('#src=[\'|"](.*.)[\'|"]#Ui', $attributes, $match_srcs);
    preg_match('#alt=[\'|"](.*.)[\'|"]#Ui', $attributes, $match_alts);

    $src = $match_srcs[1];
    $alt = $match_alts[1];

    }
    [/CODE]

    -the mole
     
    themole, Jul 29, 2007 IP