Trickest RegEx Challenge I've encountered...

Discussion in 'PHP' started by electroze, Jun 6, 2011.

  1. #1
    Hello,

    The following regular expression (regex) code is making my brain hurt. I'm trying to make a user friendly Bible search to instantly go to whatever people type in. People type in a variety of searches (see top part of code for examples).

    Here's the code I have, but I can't figure out how to deal with spaces and ranges. The goal is to allow a user to type in any of the above variables and have it parse exactly the book, chapter, and verse, plus the end range they may have typed in for chapter and or verse they may If you are smart enough to figure part of this out, I will be amazed at your intellect and you'll deserve a regex trophy.

    Specifically, I'm trying to get this search to parse into book, chapter, verse: "1 corinthians 2 : 16 - 3 : 1"



    <?php

    header('Cache-Control: no-store, no-cache, must-revalidate, max-age=0');

    // these currently work
    //$word = "Romans 3";
    //$word = "Job 3:16";
    //$word = "psalms 113:16";
    //$word = "psalm 113:116";
    //$word = "song of solomon 123:16";
    //$word = "song of songs 13:16";
    //$word = "2 Thessalonians 3:1";
    //$word = " 1 Bel and the Dragon 15:121";

    //this fails
    $word = "1 john 1 : 2";

    /*
    $word = "song of solomon 123:16 - 124:17";
    $word = "song of solomon 123:16-124:17";
    $word = "song of solomon 123:16 -124:17";
    $word = "song of solomon 3:16-3:18";


    //these below should fail gracefully
    $word = "song of solomon 1:16-0:17";
    $word = "song of solomon 123:16-1214:17";
    $word = "song of solomon 12:16-12:15";
    $word = "song of solomon 11:16-10:17";
    $word = "song of solomon 11:16-10:1";
    $word = "song of solomon 11:16-010:01";
    $word = "song of solomon 11:16 - 011:1";
    $word = "song of solomon 11:13 - 12";
    $word = "1 cor 11:16 - 12:1";
    $word = "1 cor 11 : 16 - 12 : 1";
    */

    $word = strip_tags(trim($word));


    // THIS WORKS VERY WELL! Will find in the string where a space is before a number, the + means any number (can be repeated)
    // both parenthesis in the regex and the PREG_SPLIT_DELIM_CAPTURE is needed, otherwise, it discards the regex values it finds. I want to keep the chapter
    $splitword = preg_split("/[\s]([0-9]+)/", $word, -1, PREG_SPLIT_DELIM_CAPTURE);

    // next, go on to find if colon exists, for chapter and verse
    $coloncount = substr_count($splitword[1],":");
    if ($coloncount == "1") {
    $colonpieces = explode(":", $splitword[1]);
    // $colonpieces[0] is left side of colon (chapter)
    // $colonpieces[1] is right side of colon (verse)
    if (is_numeric($colonpieces[0])) {$chapter = $colonpieces[0]; }
    if (is_numeric($colonpieces[1])) {$verse = $colonpieces[1]; }
    }




    //new part to search one chap/verse through - to another
    // this part doesn't do anything yet- still trying to figure it out
    $newsplit = preg_split("/[a-zA-Z][\s]([0-9]+)/", $word, -1, PREG_SPLIT_DELIM_CAPTURE);

    // search only if there's 1 colon, not more
    if ($coloncount == "1") {
    $hyphen = explode("-", $word);
    // notice I use $word and not $splitword, because the $splitword might have matched the space and number, like 1 Peter 4:5 - 4:10
    // $hyphen[1] is right side of hyphen

    if (is_numeric($hyphen[1])) {$verse = $colonpieces[1]; }
    }

    // if 2 colons exist, then treat


    //end of new part





    //if no number, don't split white space, just search books for match if less than 20 chars in string

    $booktosearch = trim($splitword[0]);
    $chapter = trim($splitword[1]);
    $verse = trim($splitword[2]);
    //echo "verse: ".$verse;
    $verse = trim(str_replace(":", "", $verse));

    echo "Book is: ".$booktosearch."<br />";
    echo "Chapter is: ".$chapter."<br />";
    echo "Verse is: ".$verse."<br />";
    echo "Search is: ".$word."<br /><br />";

    $chapverse = $splitword[1].$splitword[2];
    echo "combined is: ".$chapverse;


    //now, see that its 5 characters or more or is book of 1 John, of so, then search book
    //this might save mysql bandwidth, in case someone types 'the 666' or something.
    $length = strlen($booktosearch);
    if(($length > 4) || ($booktosearch == "John"))
    {echo"";
    //now, its ready to search the book
    }

    ?>
     
    electroze, Jun 6, 2011 IP
  2. electroze

    electroze Active Member

    Messages:
    179
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    60
    #2
    This is what I'm trying to find matching regex for, but when I echo the last part it truncates the 7. Any ideas?:

    $word = "song of solomon 123:16 - 124:17";

    code so far:

    $parts= preg_split("/([0-9]{1,3})[\s]?[-][\s]?([0-9]{1,3})[\s]?[:]?([0-9]?)([0-9]?)/", $word, -1, PREG_SPLIT_DELIM_CAPTURE);

    echo $parts[3];

    Any ideas on why it cuts off the last digit?
     
    electroze, Jun 7, 2011 IP
  3. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #3
    Regex can be tricky at times, but when you see yourself going through complex pattern, try to simplify it as much as you can (by changing the algorithm).

    The following code should work in your case:

    
    <?php
    
    $word = "1 cor 11:16 - 12:1";
    
    preg_match_all('/[0-9:\- ]+/', $word, $result);
    
    if (count($result[0]) > 1) {
        $numbers = $result[0][count($result[0])-1];
    } else {
        $numbers = $result[0][0];
    }
    
    $text = str_replace($numbers, '', $word);
    
    $chapter = trim($text);
    
    $numbers = explode('-', preg_replace('/ /', '', $numbers));
    
    echo $chapter . PHP_EOL . $numbers[0];
    
    if (isset($numbers[1])) {
        echo PHP_EOL . $numbers[1];
    }
    
    
    PHP:
     
    ThePHPMaster, Jun 7, 2011 IP
  4. electroze

    electroze Active Member

    Messages:
    179
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    60
    #4
    Hey, thanks a lot! I got it working now. I know its simplified to you, but it's taken me a while to figure out what that does and I understand everything, except for what this part does:

    if (count($result[0]) > 1) {
    $numbers = $result[0][count($result[0])-1];
    } else {
    $numbers = $result[0][0];
    }
     
    electroze, Jun 11, 2011 IP