Extracting user IDs and Names with Regular Expressions.

Discussion in 'PHP' started by Hade, Mar 20, 2009.

  1. #1
    I'm hoping to use regular expressions to extract the username and userid into an array as follows.

    This is what I want to achieve:

    Array
    (
        [0] => Array
            (
                [id] => 1
                [name] => Bob Smith
            )
    
        [1] => Array
            (
                [id] => 2
                [name] => David Jones
            )
    
        [2] => Array
            (
                [id] => 3
                [name] => Chris Jones
            )
    
    )
    
    Code (markup):
    Here is the raw data I have...

                        <td class="labelBold">Forename</td>
                        <td class="labelBold">Surname</td>
                        <td class="labelBold">Username</td>
                        <td class="labelBold">Type</td>
                        <td class="labelBold">Active</td>
                        <td class="input"><a href="/Action/NewUser"><img src="../images/button_new.gif" name="button_new"/></a></td>
                    </tr>
                    
                        <tr>
                            <td>Bob</td>
                            <td> Smith</td>
                            <td>bsmith</td>
                            <td>Accounts</td>
                            <td>Y</td>
                            <td><a href="/Action/FindUser?userId=1274313737324-444711"><img src="../images/button_edit.gif" name="button_edit"/></a></td>
                        </tr>
                    
                        <tr>
                            <td>David</td>
                            <td>.Jones</td>
                            <td>djones</td>
                            <td>CSA1</td>
                            <td>Y</td>
                            <td><a href="/Action/FindUser?userId=1574212146739-128494"><img src="../images/button_edit.gif" name="button_edit"/></a></td>
                        </tr>
                    
                        <tr>
                            <td>Chris</td>
                            <td>Jones</td>
                            <td>cjones</td>
                            <td>CSA3</td>
                            <td>Y</td>
                            <td><a href="/Action/FindUser?userId=1366019532726-289995"><img src="../images/button_edit.gif" name="button_edit"/></a></td>
                        </tr>
                    
                        <tr>
                            <td>GERARD</td>
                            <td>Wilson</td>
                            <td>wils</td>
                            <td>CSA1</td>
                            <td>N</td>
                            <td><a href="/Action/FindUser?userId=1454947126369-240233"><img src="../images/button_edit.gif" name="button_edit"/></a></td>
                        </tr>
    
    Code (markup):
    Thanks in advance for any ideas at all!
     
    Hade, Mar 20, 2009 IP
  2. french-webbie

    french-webbie Peon

    Messages:
    194
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Try this out where $test is your input string:

    function str_clean($str) {
        // convert to our own special marker
        $str=str_replace(array(" ","\n","\r", "\t"),'@@##[[',$str);
        // remove doubles
        $str=str_replace('@@##[[@@##[[',"",$str);
        // convert singles to spaces and trim
        $str=trim(str_replace("@@##[["," ",$str));
        return $str;   
    }
    
    $test = str_clean($test);
    
    preg_match_all('~<tr><td>(.+?)</td><td>\s*(.+?)</td><td>(.+?)</td><td>(.+?)</td><td>(.+?)</td><td><a href="/Action/FindUser\?userId=(.+?)">~i', $test, $matches, PREG_SET_ORDER);
    
    if (isset($matches[0])) {
    	foreach ($matches as $match) {
    		echo "firstname: $match[1] \nlastname: $match[2] \nid: $match[6]\n\n";
    	}
    }
    PHP:
     
    french-webbie, Mar 20, 2009 IP
  3. Hade

    Hade Active Member

    Messages:
    701
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    90
    #3
    Thanks for the help, I will look into this now and let you know if I get it working :)
     
    Hade, Mar 20, 2009 IP
  4. joebert

    joebert Well-Known Member

    Messages:
    2,150
    Likes Received:
    88
    Best Answers:
    0
    Trophy Points:
    145
    #4
    preg_match_all('#<tr>\s*<td>([^<]+</td>\s*<td>[^<]+)<.+userId=([\d-]+)[^\d-]#Usi', $str, $matches, PREG_SET_ORDER);
    foreach($matches as &$match)
    {
    	$match = array('id'=> $match[2], 'name'=> preg_replace('#</td>\s*<td>#', ' ', $match[1]));
    }
    print_r($matches);
    Code (markup):
     
    joebert, Mar 20, 2009 IP
  5. Hade

    Hade Active Member

    Messages:
    701
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    90
    #5
    Thank you very much to both of you!
    I spent ages on this yesterday and got nowhere, and you have both solved it in minutes!
     
    Hade, Mar 20, 2009 IP
  6. Hade

    Hade Active Member

    Messages:
    701
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    90
    #6
    Another quick question guys,
    I'm having trouble understanding your expressions. What if I wanted to get the third cell also, to end up with this:
    Array
    (
        [0] => Array
            (
                [id] => 1174311737324-444711
                [name] => Bob Smith
                [username] =>bsmith
            )
    
        [1] => Array
            (
                [id] => 1174312146739-128494
                [name] => David Jones
                [username] =>djones
            )
    
       
    
    )
    
    Code (markup):
    Instead of this:
    Array
    (
        [0] => Array
            (
                [id] => 1174311737324-444711
                [name] => Bob Smith
            )
    
        [1] => Array
            (
                [id] => 1174312146739-128494
                [name] => David Jones
            )
    
       
    
    )
    
    Code (markup):
    Thanks
     
    Hade, Mar 20, 2009 IP
  7. Hade

    Hade Active Member

    Messages:
    701
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    90
    #7
    Never mind, I've sorted it now thanks
     
    Hade, Mar 20, 2009 IP
  8. french-webbie

    french-webbie Peon

    Messages:
    194
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Each (couple of) parenthesis is an extracted block of data. Print out the $matches array to get a view of its structure.
     
    french-webbie, Mar 20, 2009 IP