I hope you guys can help me out (you guys are pretty damn good)! Anyway, I have some raw data and I need to extract some data from it. Basically, I need to keep the first character, the word that comes after that and then everything that comes after the "-" sign. Here is an example of a string. I've bolded the parts I want to extract below: Here's the raw data: v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs This is what I want the output to look like: v breathe draw air into and expel out of the lungs Any thoughts on how I could best accomplish this? It would be really, really appreciated! Thanks in advance.
preg_match('~^([a-z] \w+)[^-]+- ([^$]+)~i', $string, $matches); echo $matches[1], ' ', $matches[2]; PHP:
I figured out how to isolate the first two variables: <?php $rawdata = $_GET['rawdata']; $trimmed = ereg_replace("[^A-Za-z' ';-]", "", $rawdata); $trimmed2 = eregi_replace(";.*", "", $trimmed); $pieces = explode(" ", $trimmed2); ?> <html> <head></head> <body> Raw Data: <?php echo $trimmed2; ?> <br>Type: <?php echo $pieces[2]; ?> <br>Word: <?php echo $pieces[4]; ?> <br> </body> </html> This is the output: Raw Data: v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs Type: v Word: breathe Now, I just need to figure out how to isolate the data after the "-" sign.
Thanks nico. i tried putting that in and replaced $string with $trimmed (the raw data -- see the post right above this one). No luck though.
I tested it with the example you provided and it works for me. Code: <?php $string = 'v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs'; preg_match('~^([a-z] \w+)[^-]+- ([^$]+)~i', $string, $matches); echo $matches[1], ' ', $matches[2]; ?> PHP: Output: v breathe draw air into and expel out of the lungs Code (markup):
Thanks again nico. You are a god among men. I'm running into a bit of a problem though and I think I know what's wrong (although I don't know how to solve it). The raw data starts off like this: 00001740 29 v 04 breathe 0 take_a_breath 0 respire 0 suspire 3 021 * 00005041 v 0000 * 00004227 v 0000 + 03189605 a 0301 + 00819106 n 0303 + 04034949 n 0301 + 04200499 n 0105 + 00819106 n 0101 ^ 00004227 v 0103 ^ 00005041 v 0103 $ 00002325 v 0000 $ 00002573 v 0000 ~ 00002573 v 0000 ~ 00002724 v 0000 ~ 00002942 v 0000 ~ 00003826 v 0000 ~ 00004032 v 0000 ~ 00004227 v 0000 ~ 00005041 v 0000 ~ 00006735 v 0000 ~ 00007366 v 0000 ~ 00017051 v 0000 02 + 02 00 + 08 00 - draw air into, and expel out of, the lungs; "I can breathe better when the air is clean"; "The patient is respiring" Then, I modify to make it look like what we had before: <?php $rawdata = $_GET['rawdata']; //from a form $trimmed = ereg_replace("[^A-Za-z' ';-]", "", $rawdata); //takes out all the numbers and characters we don't want $trimmed2 = eregi_replace(";.*", "", $trimmed); //takes out everything after the ";" $trimmed3 = trim($trimmed2); //takes out a space in the beginning $trimmed4 = 'v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs'; //what it should look like Now, when I do var_dump on $trimmed3 and $trimmed4, it comes out like this: Trimmed3: string(198) "v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs" Trimmed4: string(122) "v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs" Why is $trimmed3 198 characters and $trimmed4 122 characters!?