I'm hoping to use regular expressions to extract the username and userid into an array as follows. This is what I want to achieve: Array ( [0] => Array ( [id] => 1 [name] => Bob Smith ) [1] => Array ( [id] => 2 [name] => David Jones ) [2] => Array ( [id] => 3 [name] => Chris Jones ) ) Code (markup): Here is the raw data I have... <td class="labelBold">Forename</td> <td class="labelBold">Surname</td> <td class="labelBold">Username</td> <td class="labelBold">Type</td> <td class="labelBold">Active</td> <td class="input"><a href="/Action/NewUser"><img src="../images/button_new.gif" name="button_new"/></a></td> </tr> <tr> <td>Bob</td> <td> Smith</td> <td>bsmith</td> <td>Accounts</td> <td>Y</td> <td><a href="/Action/FindUser?userId=1274313737324-444711"><img src="../images/button_edit.gif" name="button_edit"/></a></td> </tr> <tr> <td>David</td> <td>.Jones</td> <td>djones</td> <td>CSA1</td> <td>Y</td> <td><a href="/Action/FindUser?userId=1574212146739-128494"><img src="../images/button_edit.gif" name="button_edit"/></a></td> </tr> <tr> <td>Chris</td> <td>Jones</td> <td>cjones</td> <td>CSA3</td> <td>Y</td> <td><a href="/Action/FindUser?userId=1366019532726-289995"><img src="../images/button_edit.gif" name="button_edit"/></a></td> </tr> <tr> <td>GERARD</td> <td>Wilson</td> <td>wils</td> <td>CSA1</td> <td>N</td> <td><a href="/Action/FindUser?userId=1454947126369-240233"><img src="../images/button_edit.gif" name="button_edit"/></a></td> </tr> Code (markup): Thanks in advance for any ideas at all!
Try this out where $test is your input string: function str_clean($str) { // convert to our own special marker $str=str_replace(array(" ","\n","\r", "\t"),'@@##[[',$str); // remove doubles $str=str_replace('@@##[[@@##[[',"",$str); // convert singles to spaces and trim $str=trim(str_replace("@@##[["," ",$str)); return $str; } $test = str_clean($test); preg_match_all('~<tr><td>(.+?)</td><td>\s*(.+?)</td><td>(.+?)</td><td>(.+?)</td><td>(.+?)</td><td><a href="/Action/FindUser\?userId=(.+?)">~i', $test, $matches, PREG_SET_ORDER); if (isset($matches[0])) { foreach ($matches as $match) { echo "firstname: $match[1] \nlastname: $match[2] \nid: $match[6]\n\n"; } } PHP:
preg_match_all('#<tr>\s*<td>([^<]+</td>\s*<td>[^<]+)<.+userId=([\d-]+)[^\d-]#Usi', $str, $matches, PREG_SET_ORDER); foreach($matches as &$match) { $match = array('id'=> $match[2], 'name'=> preg_replace('#</td>\s*<td>#', ' ', $match[1])); } print_r($matches); Code (markup):
Thank you very much to both of you! I spent ages on this yesterday and got nowhere, and you have both solved it in minutes!
Another quick question guys, I'm having trouble understanding your expressions. What if I wanted to get the third cell also, to end up with this: Array ( [0] => Array ( [id] => 1174311737324-444711 [name] => Bob Smith [username] =>bsmith ) [1] => Array ( [id] => 1174312146739-128494 [name] => David Jones [username] =>djones ) ) Code (markup): Instead of this: Array ( [0] => Array ( [id] => 1174311737324-444711 [name] => Bob Smith ) [1] => Array ( [id] => 1174312146739-128494 [name] => David Jones ) ) Code (markup): Thanks
Each (couple of) parenthesis is an extracted block of data. Print out the $matches array to get a view of its structure.