PHP Regex speed vs. C/C++

Discussion in 'PHP' started by ssanders82, Aug 31, 2007.

  1. #1
    Okay, I need a wizard's opinion here. My site at http://www.SmarterReviews.com takes product reviews from around the web and attempts to discover patterns in what people are saying about various attributes of each product, like "screen" or "battery life" for an mp3 player.

    I have a huge array of regular expressions to match literally hundreds of thousands of different text patterns. A simplistic example would be "/(good|great|awesome){1} battery life/". For each sentence in the reviews, I loop through dozens of regexs like this looking for matches. It takes ~1 minute to process a few hundred reviews on my dual-core Athlon XP.

    All this is currently written in PHP (w/ PCRE). My question is, if I wrote a C/C++ module specifically for regex matching, which I would call from PHP, would that speed things up significantly? Or does PHP basically do this anyway when it invokes the regex engine? I know some languages have a "compile" feature for regular expressions, which I think validates the syntax for quicker loading later. Does PHP have this?

    I don't know how things work under the hood in PHP regex's. I haven't been able to find a lot of info on optimizing speed for large regex's.

    One more thing, I'm using the "i" modifier for all patterns (case-insensitive); would it be faster to convert the whole string to lowercase first and leave out the "i" modifier?

    TIA
     
    ssanders82, Aug 31, 2007 IP
  2. chuckd1356

    chuckd1356 Active Member

    Messages:
    770
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    70
    #2
    C/C++ has been around a lot longer than PHP.
    If I were you, I'd write it in C++. Much easier IMO.

    C/C++ should do it much quicker, as you can devote as much RAM to it as you want. PHP is a bit more complicated.

    I don't know, if I were writing a huge regex program, I'd use Java or C++, unless of course it was for web use, then PHP would be my choice.
     
    chuckd1356, Aug 31, 2007 IP
  3. ssanders82

    ssanders82 Peon

    Messages:
    77
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #3
    It's already written and working in PHP. I'm asking if there would be a speed increase if I rewrote it in C/C++ and called it from PHP.
     
    ssanders82, Aug 31, 2007 IP
  4. m0nkeymafia

    m0nkeymafia Well-Known Member

    Messages:
    399
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    125
    #4
    Yes it would be faster
    PHP is uncompiled, and even if your using zend and keep it in a semi compiled state [i forgot the term sorry] but C++ wil be compiled and fast. Plus you have control over how the regex is implemented [if you do it yoruself] else use boost which will be fast as fook anyway.

    It depends if its slow or not at the moment and whether your able to write it in C++.

    But long story short the answer is yes - if you write it properly :)
     
    m0nkeymafia, Sep 1, 2007 IP