Get data from between HTML tags

Discussion in 'PHP' started by RxDx, Jul 27, 2010.

  1. #1
    Hello, I am using :

    $pattern = "#<caption([^>]*[^/])>#i";
    
    preg_match_all($pattern ,$reply,$match); // Tested pattern
    //echo $match[0];
    
    print_r($match);
    PHP:
    To get everything between caption tags. Alltogether there are 2 caption tags and therefore 2 strings. However I get empty array as a result :

    Array
    (
        [0] => Array
            (
            )
    
        [1] => Array
            (
            )
    
    )
    
    PHP:
    What am i doing wrong?
     
    RxDx, Jul 27, 2010 IP
  2. Nick66

    Nick66 Peon

    Messages:
    19
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Try this pattern and will work.

    $pattern = "#<caption\b[^>]*>(.*?)</caption>#i";
    Code (markup):
     
    Nick66, Jul 27, 2010 IP
  3. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I still have empty array.
     
    RxDx, Jul 27, 2010 IP
  4. ze0xify

    ze0xify Peon

    Messages:
    13
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    In PHP, you define a REGEX statement by using forward slash at the beginning and end.

    $pattern = '/<caption\b[^>]*>([\s\S]*)</caption>/i';
    preg_match_all($pattern, $reply, $matches);
    print_r($matches);
    Code (markup):
     
    ze0xify, Jul 27, 2010 IP
  5. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Hm, I get this :

    <b>Warning</b>:  preg_match_all() [<a href='function.preg-match-all'>function.preg-match-all</a>]: Unknown modifier 'c' in <b>C:\wamp2\www\AutoUpdater\test2.php</b> on line <b>33</b><br />
    
    Code (markup):
    Sorry to provide source code, can't copy from browser due to some effects there.

    Line 33 is :
    preg_match_all($pattern, $reply, $matches);
    
    PHP:
     
    RxDx, Jul 27, 2010 IP
  6. Nick66

    Nick66 Peon

    Messages:
    19
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    $pattern = "#<caption\b[^>]*>(.*?)</caption>#i";
    I tested this pattern. It is certainly works but keep in mind this:
    Will not properly match tags nested inside themselves, like in <caption>text1<caption>text2</caption>text3</caption>.
    It would be helpfull if you could provide here the $reply (or part of it maybe).
     
    Nick66, Jul 28, 2010 IP
  7. danx10

    danx10 Peon

    Messages:
    1,179
    Likes Received:
    44
    Best Answers:
    2
    Trophy Points:
    0
    #7
    Wrong, forward slashes are probably the most common characters used for the delimiters, but you don't neccesarily need to use forward slashes, I prefer to use tildes (~), you can use any character which is non-alphanumeric, non-backslash or non-whitespace, refer to the documenation:

    http://www.php.net/manual/en/regexp.reference.delimiters.php
     
    danx10, Jul 28, 2010 IP
  8. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Nick,

    your pattern works, but I get :

    Array
    (
        [0] => Array
            (
                [0] => <caption\b[^>]*>(.*?)</caption>
    
            )
    
        [1] => Array
            (
                [0] => ]*>(.*?)
            )
    
    )
    
    PHP:
    and not actual content between two tags...

    For one reason I cannot post here the actual source code of the back-end site...
     
    RxDx, Jul 28, 2010 IP
  9. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #9
    The best approach to regular expression is to think of the easier way to do it:

    
    preg_match_all('/<caption.*>(.*)<\/caption>/Uism',$reply,$match)
    print_r($match[1]);
    
    PHP:
     
    ThePHPMaster, Jul 28, 2010 IP
  10. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Array
    (
    )
    
    PHP:
    This is what I get, an empty array...
     
    RxDx, Jul 28, 2010 IP
  11. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #11
    Do you have sample data?

    I tried it with this data and its working:

    
    $reply = '<caption name="test">Hi Caption 1</caption> ljhsldjkf ljkhl
    o3i;j;
    <caption>Test Caption 2</caption>';
    
    preg_match_all('/<caption.*>(.*)<\/caption>/Uism',$reply,$match);
    print_r($match[1]);
    
    PHP:
     
    ThePHPMaster, Jul 28, 2010 IP
  12. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Well, I tried on simple data, it works fine aswell. But, my $reply is really messy, lots of HTML tags and so on. there are 2 caption tags with content : Latest Version X.X.X. It provides the number of tags but not the content inside them...
     
    RxDx, Jul 28, 2010 IP
  13. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #13
    Can you provide a sample data?

    If it is confidential, you can send it via PM.
     
    ThePHPMaster, Jul 28, 2010 IP
  14. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    I have sent it to you, many thanks for help.
     
    RxDx, Jul 28, 2010 IP
  15. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #15
    This should work,

    
    preg_match_all('/caption&gt;(Latest Version.*)&lt;<span/Uism',$reply,$match);
    
    // If you just want the version number
    preg_match_all('/caption&gt;Latest Version (.*)&lt;<span/Uism',$reply,$match);
    
    PHP:
     
    ThePHPMaster, Jul 28, 2010 IP
  16. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Hello, this gives me
    Array
    (
        [0] => Array
            (
                [0] => caption&gt;Latest Version (.*)&lt;<span
            )
    
        [1] => Array
            (
                [0] => (.*)
            )
    
    )
    
    PHP:
    Can a problem be in that I am curling the page and then doing preg_match? Here is the code of cURL page :

    $reply = curl_exec($ch);
    curl_close($ch);
    // Get page content
    echo $reply;
    
    // Search for captions to receive versions of products
    
    
    
    $html = file_get_contents('test2.php'); // test 2 is a current file
    $pattern = "/<body[^>]*>(.*?)<\/body>/";
    preg_match_all('/caption&gt;Latest Version (.*)&lt;<span/Uism',$html,$match);
    print_r($match);
    PHP:
     
    RxDx, Jul 28, 2010 IP
  17. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #17
    You are not using CURL but file_get_contents.

    I tested it with the file you provided and it works.

    Just make sure that $html actually contains the HTML page you want.
     
    ThePHPMaster, Jul 28, 2010 IP
  18. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #18
    I am using cURL in beggining. I log in using it to a private member area and send some JSON code to server and get HTML content in exchange (this is to emulate user-login and opening of download page).
    Then I want to search given HTML for data between tags. So I do file_get_contents of the current file(in which I have all php, cURL, json code to be sent, etc). It also contains the HTML(i sent you the source code).
    So, the $html actually contains what I want, but responces me with empty arrays.
    I tried changing $html with $reply(which is basically the same, because $reply is answer from server with HTML inside), but still empty array.
     
    RxDx, Jul 28, 2010 IP
  19. Thorlax402

    Thorlax402 Member

    Messages:
    194
    Likes Received:
    2
    Best Answers:
    5
    Trophy Points:
    40
    #19
    If you want to do it without preg_match then you can try this:

    
    function get_tag_info($string, $tag) {
    	$start = '<'.$tag.'>';
    	$end = '</'.$tag.'>';
        $ini = strpos($string,$start); 
        if ($ini === false)
    		return ""; 
        $ini += strlen($start); 
        $len = strpos($string,$end,$ini) - $ini; 
        return substr($string,$ini,$len); 
    }
    
    PHP:
    just run get_tag_info($html, 'html');
     
    Thorlax402, Jul 29, 2010 IP
  20. RxDx

    RxDx Guest

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Thanks, that works great!
     
    RxDx, Jul 29, 2010 IP