1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

parser problem

Discussion in 'PHP' started by Jeremy Benson, Apr 13, 2019.

  1. #1
    Hello,
    SEMrush
    I have an actual issue this time. My parser uses a simple preg_match method, but it's matching more than one command. 'unseath from left' is matched with 'remove left.' I know this is got to be common, but I'm not sure of the fix.

    class CommandParser
    {
    
        private $command;
    
        public function parseCommand($prompt)
        {
                
            $commands = array(
                            array("command" => "item info", "regex" => "/system item info token (?<token>\w+) table (?<table>.+)/"),
                            array("command" => "put in container", "regex" => "/put the (?<item>.+) in the (?<container>.+)/"),
                            array("command" => "take from container", "regex" => "/take the (?<item>.+) from the (?<container>.+)/"),
                            array("command" => "take from container", "regex" => "/get the (?<item>.+) from the (?<container>.+)/"),
                            array("command" => "take from container", "regex" => "/take (?<item>.+) from the (?<container>.+)/"),
                            array("command" => "take from container", "regex" => "/get (?<item>.+) from the (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put the (?<item>.+) in to the (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put the (?<item>.+) into the (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put the (?<item>.+) in to (?<container>.+)/"),
                            array("command" => "take from container", "regex" => "/take (?<item>.+) from (?<container>.+)/"),
                            array("command" => "take from container", "regex" => "/get (?<item>.+) from (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put (?<item>.+) in the (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put (?<item>.+) into the (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put (?<item>.+) in to the (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put the (?<item>.+) in (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put (?<item>.+) into (?<container>.+)/"),
                            array("command" => "put in container", "regex" => "/put (?<item>.+) in (?<container>.+)/"),
                            array("command" => "look in container", "regex" => "/look in the (?<container>.+)/"),
                            array("command" => "look in container", "regex" => "/look in (?<container>.+)/"),
                            array("command" => "unsheath from", "regex" => "/unsheath from (?<location>.+)/"),
                            array("command" => "remove item", "regex" => "/remove the (?<item>.+)/"),
                            array("command" => "wear item", "regex" => "/wear the (?<item>.+)/"),
                            array("command" => "travel to room", "regex" => "/travel room (?<token>\w+)/"),
                            array("command" => "unsheath weapon", "regex" => "/unsheath (?<weapon>.+)/"),
                            array("command" => "get item", "regex" => "/get (?<item>.+)/"),
                            array("command" => "wear item", "regex" => "/wear (?<item>.+)/"),
                            array("command" => "remove item", "regex" => "/remove (?<item>.+)/"),
                            array("command" => "look", "regex" => "/look/"),
                            array("command" => "map", "regex" => "/map/"),
                            array("command" => "character", "regex" => "/character/"),
                            array("command" => "equipped", "regex" => "/equipped/")
                            );
            $command = array();
                
            $matchFound = false;
        
            foreach($commands as $key => $val)
            {
            
                if(!$matchFound)
                {
                            
                    preg_match($commands[$key]['regex'], $prompt, $matches);
                    {
                
                        if(!empty($matches))
                        {
                        
                            $command["command"] = $commands[$key]['command'];
                            $command["matches"] = $matches;
                            $matchFound = true;
                        
                        
                        }
                
                    }
            
                }
        
            
            }
        
            if(empty($command))
            {
            
                $command["command"] = 'unrecognized command';
                $command["matches"] = '';
            
            }
                
            //var_dump($command);    
            return $command;
        
        }
    
    }
    Code (markup):
     
    Jeremy Benson, Apr 13, 2019 IP
    SEMrush
  2. deathshadow

    deathshadow Acclaimed Member

    Messages:
    8,975
    Likes Received:
    1,635
    Best Answers:
    233
    Trophy Points:
    515
    #2
    I think you're going about this the hard way. You're doing full regex of the entire phrase instead of splitting the words and handling them one at a time.

    Then you can just have one action for multiple words, drop words that don't matter or have any real meaning like "the", not have to iterate that giant set of regex, etc, etc, etc. Would also allow for reverse verbiage since you could deal with action/subect/target and action/target/subject.

    For example:

    Insert the berry into bag

    insert is the action, the is pointless so ignore it, berry is the item, into is a modifier we can just ignore since what else does insert mean? Then the target. As such I'd have an associative array of actions where the index is the action, and the value is the "common" name for the action -- or perhaps even a pointer to the appropriate function. Ignore unneeded words until we get to a item, ignore unneeded words until you get to the target. Done.

    Simple associative array lookups with array_key_exists, ditching the regex once you've split the words of the sentence. I'd probably even have an array of words to ignore (into, the, of), words to chain (and) and so forth in arrays I'd check before trying to check for other tokens. For the ones to ignore you wouldn't even need associative, just a flat array to check against.

    Since this looks like a text adventure game, 99%+ of your user input is going to boil down to only three words at most mattering -- action, item, target (put/get/insert/apply/give) and/or action, target, item (take/remove/steal). Break up the words, remove the ones that do nothing, then thesaurus the spit out of it.
     
    deathshadow, Apr 15, 2019 IP
  3. JEET

    JEET Well-Known Member

    Messages:
    2,275
    Likes Received:
    118
    Best Answers:
    2
    Trophy Points:
    185
    #3
    The word "left" is seen as "<location>" and "<item>" in the
    'unseath from
    and
    remove from
    commands.

    Look at the regex you have written for these 2 commands in that commands list...
    Its not identifying "remove from left" as a command.
    Its identifying it as "remove from <left>"

    Add another entry for "remove from left" command above the "remove from" command

    That will make sure that "remove from left" gets caught first and function breaks and returns an output.
     
    JEET, Apr 16, 2019 IP
  4. Jeremy Benson

    Jeremy Benson Active Member

    Messages:
    347
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    73
    #4
    Thanks guys. I solved with Deathshadow's idea like this:

    $regex = array('/(?<action>\w+) (?:the|a)* (?<subject>.+) (?:in|from|the|a)* (?:in|from|the|a)* (?<target>.+)/');
    Code (markup):
    Let me know if there are major issues with this simplified version, or anything better:

    
    private $regex = array(
                            array("command" => "double object action", "regex" => '/(?<action>\w+) (?:the|a)* (?<subject>.+) (?:in|from|the|a)* (?:in|from|the|a)* (?<target>.+)/'),
                            array("command" => "item info", "regex" => "/system item info token (?<token>\w+) table (?<table>.+)/"),
                            array("command" => "skill roll", "regex" => "/(?:roll)* (?<skill>\w+) (?:of|at)* (?<difficulty>.+)/"),
                            array("command" => "single object action", "regex" => "/(?<action>\w+) (?:the|a)* (?<subject>.+)/"),
                            array("command" => "travel to room", "regex" => "/travel room (?<token>\w+)/"),
                            array("command" => "look", "regex" => "/look/"),
                            array("command" => "map", "regex" => "/map/"),
                            array("command" => "character", "regex" => "/character/"),
                            array("command" => "equipped", "regex" => "/equipped/"),
                            array("command" => "swap hands", "regex" => "/swap hands/"),
                            array("command" => "wield weapon", "regex" => "/wield weapon/")
                          
            );
    
    Code (markup):
     
    Last edited: Apr 20, 2019
    Jeremy Benson, Apr 20, 2019 IP
  5. deathshadow

    deathshadow Acclaimed Member

    Messages:
    8,975
    Likes Received:
    1,635
    Best Answers:
    233
    Trophy Points:
    515
    #5
    I'd probably make $regex static (so as to reduce memory thrashing) and use MODERN array construction of [].... but other than that I think you're on the right track.

    Glad my suggestion helped. I have some experience working with this type of data, but it's been nearly three and a half decades since I least dealt with it. It looks a lot like the old text-only adventures from the late 70's and early 80's which were amongst my favorite games. Raaka-Tu, Bedlam, Adventure, Wishbringer/spellcaster, etc. Amazingly Zork was my least favorite.

    Even wrote my own engine at one point in Pascal for a friend's game -- "Beyond the Pale" -- that I wish I still had a copy of. They initially tried to write it in Prolog, which turned into a complete disaster. 1980's PC compatibles did not have 2 megs of memory for a text adventure. The original took ~10 seconds of hard disk access just to process one user command. My Pascal rewrite got it down to running completely out of memory in under 512k, and it could run in as little as 128k leveraging a 360k data disk. The big trick was similar to what I suggested here -- stripping out unnecessary words so all your left with is nouns and verbs.

    You might want to try playing a few of those old games to get a feel for how they stripped down the language, handled navigation, and required certain words to be in certain orders. Likewise the Z-Machine documentation may be of service.

    http://inform-fiction.org/zmachine/standards/
     
    deathshadow, Apr 20, 2019 IP
  6. Jeremy Benson

    Jeremy Benson Active Member

    Messages:
    347
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    73
    #6
    That is cool. I loved old text adventures. I played a couple, mostly recently, but I do remember seeing them when I was young. Zork. This project is for online use. Less of a text adventure. More of a free-form mud hybrid. I'm still hitting a snag. The code here fires a single object action for: get the ball from bag. matching 'the ball,' with subject.
     
    Last edited: Apr 20, 2019
    Jeremy Benson, Apr 20, 2019 IP
  7. deathshadow

    deathshadow Acclaimed Member

    Messages:
    8,975
    Likes Received:
    1,635
    Best Answers:
    233
    Trophy Points:
    515
    #7
    That's why I wouldn't be trying to write a complex regex to handle this. I'd to a string split via whitespace, then go one word at a time. Yes, that's "brute force" but it would likely be more precise and versatile. strtolower, then preg_split /\W+/, and iterate that resultant array one word at a time.

    Regex is great for processing uniform predictable input. User input sentences are many things, uniform isn't one of them.
     
    deathshadow, Apr 21, 2019 IP
  8. Jeremy Benson

    Jeremy Benson Active Member

    Messages:
    347
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    73
    #8
    I understand what you mean now. I'll implement that then. Thanks again Deathshadow. Actually, I have new complexity. How do I extract the various elements. Command, subject, target. Users can craft items and name them, so dissecting at keyword cutoffs would be hard.

    <get the ball from the the big backpack>
    <get the the bag from the the big backpack>

    In this scenario users have crafted items 'the big backpack' and 'the bag' Which would be different than if everything was all easy scenarios:

    <get the ball>
    <get the ball from the backpack>

    I suppose I could have reserved keywords. Players can't include "the/on/in/from" in crafted item names. Then I could use them as start points to cut out the words.
     
    Last edited: Apr 21, 2019
    Jeremy Benson, Apr 21, 2019 IP
  9. Jeremy Benson

    Jeremy Benson Active Member

    Messages:
    347
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    73
    #9
    I got this solved in a hack kind of way.
    switch($match[0])
            {
               
                case 'get':
                    // get object
                    // get object from object
                    // get object in object
                    // get object in the object
                    // get object on object
                    // get object on the object
                   
                    // get the object from the object
                    // get the object in object
                    // get the object in the object
                    // get the object on object
                    $command = array();
                    $command['command'] = '';
                    $command['object'] = '';
                    $command['subject'] = '';
                    $command['preposition'] = '';
                    $command['preposition2'] = '';
                    $object1Found = false;
                    $preposition1Found = false;
                   
                    if($match[1] !== 'the')
                    {
                       
                        $count = 1;
                        $object = '';
                        $command['command'] = 'get';
                       
                        for($i = 1; $i <= count($match) - 1; $i++)
                        {
                           
                            if(!in_array($match[$i], array('from', 'in', 'on')))
                            {
                               
                                if(!$object1Found)
                                {
                               
                                    $command['object'].= $match[$i];
                               
                                }else{
                                   
                                    $command['subject'].= $match[$i];
                                   
                                }
                               
                               
                            }else{
                               
                                if(!$preposition1Found)
                                {
                               
                                    $command['preposition'] = $match[$i];
                                    $object1Found = true;
                                    $preposition1Found = true;
                                   
                                }else{
                                   
                                    $command['preposition2'] = $match[$i];
                                   
                                }
                            }
                           
                        }
                       
                    }
                    var_dump($command);
                break;
               
               
            }
    Code (markup):
     
    Jeremy Benson, Apr 21, 2019 IP