Probably an easy question...help needed!

Discussion in 'PHP' started by Gugel, Aug 13, 2008.

  1. #1
    I hope you guys can help me out (you guys are pretty damn good)! Anyway, I have some raw data and I need to extract some data from it.

    Basically, I need to keep the first character, the word that comes after that and then everything that comes after the "-" sign.

    Here is an example of a string. I've bolded the parts I want to extract below:

    Here's the raw data:
    v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs

    This is what I want the output to look like:
    v breathe draw air into and expel out of the lungs


    Any thoughts on how I could best accomplish this? It would be really, really appreciated! Thanks in advance.
     
    Gugel, Aug 13, 2008 IP
  2. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #2
    
    preg_match('~^([a-z] \w+)[^-]+- ([^$]+)~i', $string, $matches);
    
    echo $matches[1], ' ', $matches[2];
    
    PHP:
     
    nico_swd, Aug 14, 2008 IP
  3. Gugel

    Gugel Peon

    Messages:
    38
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I figured out how to isolate the first two variables:

    <?php
    $rawdata = $_GET['rawdata'];
    $trimmed = ereg_replace("[^A-Za-z' ';-]", "", $rawdata);
    $trimmed2 = eregi_replace(";.*", "", $trimmed);
    $pieces = explode(" ", $trimmed2);
    ?>



    <html>
    <head></head>
    <body>
    Raw Data:
    <?php
    echo $trimmed2;
    ?>
    <br>Type:
    <?php
    echo $pieces[2];
    ?>
    <br>Word:
    <?php
    echo $pieces[4];
    ?>
    <br>
    </body>
    </html>

    This is the output:
    Raw Data: v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs
    Type: v
    Word: breathe


    Now, I just need to figure out how to isolate the data after the "-" sign.
     
    Gugel, Aug 14, 2008 IP
  4. Gugel

    Gugel Peon

    Messages:
    38
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks nico. i tried putting that in and replaced $string with $trimmed (the raw data -- see the post right above this one). No luck though.
     
    Gugel, Aug 14, 2008 IP
  5. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #5
    I tested it with the example you provided and it works for me.

    Code:
    
    <?php
    
    $string = 'v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs';
    
    preg_match('~^([a-z] \w+)[^-]+- ([^$]+)~i', $string, $matches);
    
    echo $matches[1], ' ', $matches[2];
    
    ?>
    
    PHP:
    Output:
    
    v breathe draw air into and expel out of the lungs
    
    Code (markup):
     
    nico_swd, Aug 14, 2008 IP
  6. Gugel

    Gugel Peon

    Messages:
    38
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Thanks again nico. You are a god among men. I'm running into a bit of a problem though and I think I know what's wrong (although I don't know how to solve it).

    The raw data starts off like this:
    00001740 29 v 04 breathe 0 take_a_breath 0 respire 0 suspire 3 021 * 00005041 v 0000 * 00004227 v 0000 + 03189605 a 0301 + 00819106 n 0303 + 04034949 n 0301 + 04200499 n 0105 + 00819106 n 0101 ^ 00004227 v 0103 ^ 00005041 v 0103 $ 00002325 v 0000 $ 00002573 v 0000 ~ 00002573 v 0000 ~ 00002724 v 0000 ~ 00002942 v 0000 ~ 00003826 v 0000 ~ 00004032 v 0000 ~ 00004227 v 0000 ~ 00005041 v 0000 ~ 00006735 v 0000 ~ 00007366 v 0000 ~ 00017051 v 0000 02 + 02 00 + 08 00 - draw air into, and expel out of, the lungs; "I can breathe better when the air is clean"; "The patient is respiring"

    Then, I modify to make it look like what we had before:

    <?php
    $rawdata = $_GET['rawdata']; //from a form
    $trimmed = ereg_replace("[^A-Za-z' ';-]", "", $rawdata); //takes out all the numbers and characters we don't want
    $trimmed2 = eregi_replace(";.*", "", $trimmed); //takes out everything after the ";"
    $trimmed3 = trim($trimmed2); //takes out a space in the beginning
    $trimmed4 = 'v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs'; //what it should look like

    Now, when I do var_dump on $trimmed3 and $trimmed4, it comes out like this:
    Trimmed3: string(198) "v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs"
    Trimmed4: string(122) "v breathe takeabreath respire suspire v v a n n n n v v v v v v v v v v v v v v - draw air into and expel out of the lungs"

    Why is $trimmed3 198 characters and $trimmed4 122 characters!?
     
    Gugel, Aug 14, 2008 IP