need help with using / as a character in simple php scraper script

Discussion in 'PHP' started by randomIntellections, Oct 16, 2011.

  1. #1
    Hi,

    I am using a php scraper script to update links in a html , i need help using / in the search string

    Code:

    $data = file_get_contents('url here');
    $regex = '/<div class="afisare3">(.+?)</div>/';
    preg_match($regex,$data,$match);
    $channelcode = $match[0];
    cex_file_put_contents("./file.html",$upcode, $channelcode,$downcode);

    What i am trying to do is get the code between the div class afisare . The problem is right now I use this line:
    $regex = '/<div class="afisare3">(.+?)div>/';
    This doesn't work when there is a new div tag in the code.
    Because I don't know how to add / in the string without generating php errors as / is the escape character. How do I acheive this ?
     
    Solved! View solution.
    randomIntellections, Oct 16, 2011 IP
  2. #2
    Either change the delimiter - instead of using "/", use "~" since it is rarely used.

    Or else just escape the forward slash by placing a backslash in front of it, eg: <\/div>
     
    blueparukia, Oct 16, 2011 IP
    randomIntellections likes this.
  3. Rukbat

    Rukbat Well-Known Member

    Messages:
    2,908
    Likes Received:
    37
    Best Answers:
    51
    Trophy Points:
    125
    #3
    Two things:

    1) As blueparukia said, use \/ (back slash-forward slash) to escape the forward slash. (You can't use a different character - <~div> won't end a div).

    2) You're going to have to create what's called a state machine. Set a flag to 1 when you pass the beginning of the div. If you pass the beginning of another div, increment the flag. If you pass a </div>, decrement the flag. You're not out of the original div until the flag is 0. (There are more elegant ways to make a state machine, but that's simple and all you need.)
     
    Rukbat, Oct 16, 2011 IP
  4. randomIntellections

    randomIntellections Well-Known Member

    Messages:
    985
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    180
    #4
    thanks for the replies, I got it working with it now .

    Can you provide more info on how I can implement the 2. state machine thing you talk about. I barely know php and can modify code based on my c knowledge.
     
    randomIntellections, Oct 16, 2011 IP
  5. Rukbat

    Rukbat Well-Known Member

    Messages:
    2,908
    Likes Received:
    37
    Best Answers:
    51
    Trophy Points:
    125
    #5
    Just as I described it. Set an int to 0. When you find your div, inc (int++) the int. Every time you find a "<div" inc it again. Every time you find a "</div", dec (int--) it. When you reach 0, you've hit the "</div>" for your div's id. (You can replace the </div> with "", or you can back up your string pointer - it depends on what you're doing, whether you care that the last </div> is in your result, etc.)

    The program is the same in any language, only the code is different.
     
    Rukbat, Oct 16, 2011 IP
  6. blueparukia

    blueparukia Well-Known Member

    Messages:
    1,564
    Likes Received:
    71
    Best Answers:
    7
    Trophy Points:
    160
    #6
    I meant as a delimiter. eg. $regex = '~<div>Stuff</div>~';
    Will allow you to write it without needing to escape the slash.
     
    blueparukia, Oct 16, 2011 IP