regex advice..

Discussion in 'PHP' started by linkinpark2014, Jan 26, 2009.

  1. #1
    hello everyone,
    I have an issue with regex
    here is the text that i want to extract the highlighted text:

    "fTnh",13972496,"some text Important text bla bla bla",50423,"","04204733",416,7279,2075309,"","160345","",0


    so here what i did for regex pattern but i keep getting error messages anyway here is the snippets:

    $text= '"fTnh",[B][COLOR="Red"]13972496[/COLOR][/B],"some text [B][COLOR="Red"]Important text[/COLOR][/B] bla bla bla",50423,"","04204733",416,7279,2075309,"","160345","",0';
    
    [COLOR="Red"]$pattern='"fTnh",'.".*?".','.".*? Important text.*?".',';	[/COLOR]
    	
    preg_match_all("$pattern",$text,$out));
    Code (markup):
    so any ideas how to fix this?
     
    linkinpark2014, Jan 26, 2009 IP
  2. hassanahmad1

    hassanahmad1 Active Member

    Messages:
    150
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    60
    #2
    
    $text= '"fTnh",13972496,"some text Important text bla bla bla",50423,"","04204733",416,7279,2075309,"","160345","",0';
    
    $pattern= '".*",([0-9]+),"(.+)",.*';
    
    eregi($pattern, $text, $matches);
    echo $matches[1] . "<br>" . $matches[2];
    
    
    Code (markup):
    haven't tried it, but i am sure it work.
     
    hassanahmad1, Jan 26, 2009 IP
  3. red_mamba

    red_mamba Peon

    Messages:
    63
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    and whynot explode the text with ',' first and then do the regex magic od arra[2]? :D
     
    red_mamba, Jan 26, 2009 IP
  4. linkinpark2014

    linkinpark2014 Peon

    Messages:
    153
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    im getting this error:
    Warning: preg_match_all() [function.preg-match-all]: Unknown modifier ','
    Code (markup):
     
    linkinpark2014, Jan 27, 2009 IP
  5. linkinpark2014

    linkinpark2014 Peon

    Messages:
    153
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    good idea but what should i do if i have 54KBs file full with this text?
     
    linkinpark2014, Jan 27, 2009 IP
  6. xrvel

    xrvel Notable Member

    Messages:
    918
    Likes Received:
    30
    Best Answers:
    2
    Trophy Points:
    225
    #6
    Do you have more examples about the plain string and what to catch?
    Are you sure that the string is always has "fTnh", on the front?
     
    xrvel, Jan 27, 2009 IP
  7. linkinpark2014

    linkinpark2014 Peon

    Messages:
    153
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    here is a sample of that file:
    "ftnh",[COLOR="Red"]14465707[/COLOR],"bla bla text [COLOR="Red"]IMPORTANT TEXT[/COLOR] bla bla bla",799473,"","065612",28,40,1605573,"","191543","",0,		<=== check the differences type1
    "ftnh",14471405,"bla bla text IMPORTANT TEXT bla bla bla",1646558,"","155302",46,18,1605573,"","190905","",0,		<=== check the differences type1
    "ftnh",14443139,"bla bla text IMPORTANT TEXT bla bla bla",1232179,"","y040034",393,171,1148089,"","190851","",0,	<=== check the differences type1
    "fTnh",14445246,"bla bla text IMPORTANT TEXT bla bla bla",1225476,"","y121418",43,138,1445848,"","183820","",0,     <=== check the differences type1
    "ftnh",14458810,"bla bla text IMPORTANT TEXT bla bla bla",1417515,"","y220559",38,39,1445848,"","180919","",0,		<=== check the differences type1
    "ftnh",14382043,"bla bla text IMPORTANT TEXT bla bla bla",1225476,"","23124909",45,72,626153,"","180619","",0,		<=== check the differences type1
    "fnh",14456171,"bla bla text bla bla bla",1225476,"","y203019",49,107,2300064,"","172710","",0, 		<=== check the differences type2
    "fTnh",13972496,"bla bla text IMPORTANT TEXT bla bla bla",50423,"","04204733",435,7345,1972877,"","172340","",0,
    "ftnh",14389035,"bla bla text IMPORTANT TEXT bla bla bla",1225476,"","23180218",42,166,1234430,"","155702","",0,
    "ftn",14441432,"bla bla text bla bla bla",2306701,"","y013833",11,15,1769461,"","090825","",0,		<=== check the differences type3
    PHP:
    I want the data in the lines started with "ftnh" so i need the number after and the important text...thats all
     
    linkinpark2014, Jan 27, 2009 IP
  8. xrvel

    xrvel Notable Member

    Messages:
    918
    Likes Received:
    30
    Best Answers:
    2
    Trophy Points:
    225
    #8
    
    <?php
    $str = '
    "ftnh",14465707,"bla bla text [color="Red"]IMPORTANT TEXT[/color] bla bla bla",799473,"","065612",28,40,1605573,"","191543","",0,
    "ftnh",14471405,"bla bla text IMPORTANT TEXT bla bla bla",1646558,"","155302",46,18,1605573,"","190905","",0,
    "ftnh",14443139,"bla bla text IMPORTANT TEXT bla bla bla",1232179,"","y040034",393,171,1148089,"","190851","",0,
    "fTnh",14445246,"bla bla text IMPORTANT TEXT bla bla bla",1225476,"","y121418",43,138,1445848,"","183820","",0,
    "ftnh",14458810,"bla bla text IMPORTANT TEXT bla bla bla",1417515,"","y220559",38,39,1445848,"","180919","",0,
    "ftnh",14382043,"bla bla text AM I IMPORTANT bla bla bla",1225476,"","23124909",45,72,626153,"","180619","",0,
    "ftnh",14456171,"bla bla text bla bla bla",1225476,"","y203019",49,107,2300064,"","172710","",0,
    "fTnh",13972496,"bla bla text CATCH ME I AM IMPORTANT bla bla bla",50423,"","04204733",435,7345,1972877,"","172340","",0,
    "ftnh",14389035,"bla bla text IMPORTANT TEXT bla bla bla",1225476,"","23180218",42,166,1234430,"","155702","",0,
    "ftn",14441432,"bla bla text bla bla bla",2306701,"","y013833",11,15,1769461,"","090825","",0,
    ';
    
    preg_match_all('/ftnh",([0-9]+),"bla bla text ([a-z0-9 ]+)?bla bla bla/i', $str, $match);
    unset($match[0]);
    
    echo '<pre>';
    print_r($match);
    echo '</pre>';
    ?>
    
    PHP:
    Output:
    
    Array
    (
        [1] => Array
            (
                [0] => 14471405
                [1] => 14443139
                [2] => 14445246
                [3] => 14458810
                [4] => 14382043
                [5] => 14456171
                [6] => 13972496
                [7] => 14389035
            )
    
        [2] => Array
            (
                [0] => IMPORTANT TEXT 
                [1] => IMPORTANT TEXT 
                [2] => IMPORTANT TEXT 
                [3] => IMPORTANT TEXT 
                [4] => AM I IMPORTANT 
                [5] => 
                [6] => CATCH ME I AM IMPORTANT 
                [7] => IMPORTANT TEXT 
            )
    
    )
    
    Code (markup):
     
    xrvel, Jan 27, 2009 IP
  9. linkinpark2014

    linkinpark2014 Peon

    Messages:
    153
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    impressive...:) thank u so much
    but I still have problem...
    In the real document Just instead of the "IMPORTANT TEXT" i have arabic characters...I really dont know what should i input there instead of ([a-z0-9 ]+)?....


    do u have any idea how to use foreign characters 'arabic,turkish, russian' with regex?
     
    linkinpark2014, Jan 27, 2009 IP
  10. red_mamba

    red_mamba Peon

    Messages:
    63
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #10

    I usualy explode the text with "\r\n" or just one of thise characters.

    they go throw array
    foreach ($lines as $line)
    {}

    an explode each $line with ','

    then you have data separated and
    on array[3] use regex if you need one
     
    red_mamba, Jan 27, 2009 IP
  11. xrvel

    xrvel Notable Member

    Messages:
    918
    Likes Received:
    30
    Best Answers:
    2
    Trophy Points:
    225
    #11
    What about the "bla bla text" part? Is it always on english?
     
    xrvel, Jan 28, 2009 IP
  12. linkinpark2014

    linkinpark2014 Peon

    Messages:
    153
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    nope all in non-Latin characters..
     
    linkinpark2014, Jan 28, 2009 IP
  13. xrvel

    xrvel Notable Member

    Messages:
    918
    Likes Received:
    30
    Best Answers:
    2
    Trophy Points:
    225
    #13
    Try this (modified regular expression)
    
    preg_match_all('/ftnh",([0-9]+),"bla bla text (.*)?bla bla bla/iU', $str, $match);
    
    PHP:
    And try to change the "bla bla" with non latin characters (not tested yet).
     
    xrvel, Jan 28, 2009 IP
  14. linkinpark2014

    linkinpark2014 Peon

    Messages:
    153
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #14
    thanx all for ur help after a while from hitting my head with keyboard, i finally found the solution for the f*cked up arabic characters..
    here is what i did:

    $content = get_content("$url2");  //grabbing site's content
    $tmpContent=iconv("windows-1256", "utf-8", "$content"); //encoding the grabbed content and change it to utf-8 :D
    $encoded= utf8_encode($tmpContent); // now encode it again to utf8 format to start matching ;)
    
    $encoded_phrase=utf8_encode('مسابقة');//encode our phrase to match the encoded content
    $pattern='/"ftnh",(.*?),(.*?)'."($encoded_phrase)".'(.*?),/'; //this pattern will get all words near "the encoded word;"
    
    if(preg_match_all($pattern,$encoded,$out,PREG_PATTERN_ORDER))
    {
    echo "topics found:<br>";
    }
    PHP:
    ps: i think this is the best and easiest solution for using regex with arabic sites..
     
    linkinpark2014, Jan 29, 2009 IP