1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

SlimParser Word Part Tool

Discussion in 'JavaScript' started by Jeremy Benson, Sep 26, 2021.

  1. #1
    I started revisiting text parsers and analyzing text parts. I have a really nice parser started that I'd use in Electron. Others making text games might like this. There's a simple bug, and I had the answer before, but it's gone. I'm using regex to fetch out named capture groups. The only bug I see in this is that the groups for indirectObject and directObject need to be any number of characters and spaces.

    alert2 - The delimiter would be the last word in the capture group.
    alert3 - Ibid.
    alert4 - The delimiter is a preposition.
    alert5 - The first delimiter for directObject is a preposition, indirectObject is last word in capture group.

    // This parser will return a parsedCommand object filled with usefull sentence parts.
    
    // bug - All the regex commands need to allow directObject and indirectObject to be any number of words including spaces, to match 'red car.' Prepositions can be used to disambiguate.
       
        function SlimParser()
        {
    
            this.parsedCommand = { "patternType":null,
                                   "verb":null,
                                   "indirectObject":null,
                                   "directObject":null,
                                   "preposition":null,
                                   "error":false
                                  };
                                 
            this.prepositionList = ['on',
                                    'under',
                                    'over',
                                    'above',
                                    'down',
                                    'up',
                                    'with',
                                    'across',
                                    'around',
                                    'from',
                                    'at',
                                    'to',
                                    'for',
                                    'about'];
    
        }
    
      SlimParser.prototype.parse = function(command){
           
            var commandArray = command.split(" ");
    
       
                    // prepositions: on|under|over|above|down|up|with|across|from|at|to|for|about
                    // Prepositions help make regex queries unique, due to placement in sentence.
    
                    // <verb> the(opt) <directObj> <preposition> the(opt) <indirectObj>
                    // <verb> <preposition> the(opt) <directObj>
                    // <verb> the(opt) <directObj> <preposition>
                    // <verb> the(opt) <directObj>
                    // <verb>
                   
                    if(commandArray.length == 1)
                    {
                   
                        if(/(?<verb>[^ $]*)/.test(command))
                        {
                           
                            const matches = /(?<verb>[^ $]*)/.exec(command);
                            this.parsedCommand.patternType = "V"; // verb
                            this.parsedCommand.verb = matches.groups.verb;
                           
                            alert("in 1");
                           
                        }else{
                           
                            // throw error
                           
                        }
                   
                        // end single verb pattern
                    }else{
                       
                          if(!this.prepositionInStr(command))
                          {
                             
                              if(/(?<verb>[^ $]*)( the)? (?<directObject>[\w+]*)/.test(command))
                                {   
                               
                                    const matches = /(?<verb>[^ $]*)( the)? (?<directObject>[\w+]*)/.exec(command);
                                    this.parsedCommand.patternType = "VO"; // verb object
                                    this.parsedCommand.verb = matches.groups.verb;
                                    this.parsedCommand.directObject = matches.groups.directObject;
                                   
                                    alert("in 2");
                                   
                                }
                             
                              // end patterns without preposition
                          }else{
                              // patterns with prepositions
                            
                             
                                if(this.strIsPrepoistion(commandArray[1]))
                                {
                                   
                                    if(/(?<verb>[^ $]*) (on|under|over|above|down|up|with|across|around|from|at|to|for|about)( the)? (?<directObject>[^ $]*)/.test(command))
                                    {
                                       
                                        const matches = /(?<verb>[^ $]*) (on|under|over|above|down|up|with|across|around|from|at|to|for|about)( the)? (?<directObject>[^ $]*)/.exec(command);
                                        this.parsedCommand.patternType = "VPO"; // verb preposition object
                                        this.parsedCommand.verb = matches.groups.verb;
                                        this.parsedCommand.preposition = commandArray[1];
                                        this.parsedCommand.directObject = matches.groups.directObject
                                        this.parsedCommand.preposition = this.prepositionFetch(command);
                                       
                                        alert("in 3");
                                   
                                    }
                                   
                                }else if(this.strIsPrepoistion(commandArray[commandArray.length - 1]))
                                {
                                   
                                    if(/(?<verb>[^ $]*)( the)? (?<directObject>[^ $]*) (on|under|over|above|down|up|with|across|around|from|at|to|for|about)/.test(command))
                                    {
                                   
                                        const matches = /(?<verb>[^ $]*)( the)? (?<directObject>[^ $]*) (on|under|over|above|down|up|with|across|around|from|at|to|for|about)/.exec(command);
                                        this.parsedCommand.patternType = "VOP"; // verb object preposition
                                        this.parsedCommand.verb = matches.groups.verb;
                                        this.parsedCommand.preposition = commandArray[commandArray.length - 1];
                                        this.parsedCommand.directObject = matches.groups.directObject
                                        this.parsedCommand.indirectObject = matches.groups.indirectObject
                                       
                                        alert("in 4");
                                   
                                    }
                                   
                                }else{
                                   
                                    if(/(?<verb>[^ $]*)( the)? (?<directObject>[^ $]*) (on|under|over|above|down|up|with|across|around|from|at|to|for|about)( the)? (?<indirectObject>[^ $]*)/.test(command))
                                    {
                                   
                                        const matches = /(?<verb>[^ $]*)( the)? (?<directObject>[^ $]*) (on|under|over|above|down|up|with|across|around|from|at|to|for|about)( the)? (?<indirectObject>[^ $]*)/.exec(command);
                                        this.parsedCommand.patternType = "VOPO"; // verb object preposition object
                                        this.parsedCommand.verb = matches.groups.verb;
                                        this.parsedCommand.directObject = matches.groups.directObject
                                        this.parsedCommand.indirectObject = matches.groups.indirectObject
                                        this.parsedCommand.preposition = this.prepositionFetch(commandArray);
                                       
                                        alert("in 5");
                                       
                                    }
                                       
                                }
                             
                               
                              // end patterns with prepositions
                          }
                   
                       
                        // end other patterns
                    }
                   
                    return this.parsedCommand;
           
        };
       
        // test for a preposition in string
       
        SlimParser.prototype.prepositionInStr = function(command)
        {
       
            // test if preposition available
            let prepositionAvailable = false;
           
            for(let i = 0; i <= this.prepositionList.length - 1; i++)
            {
               
                if(command.includes(this.prepositionList[i]))
                {
                       
                    prepositionAvailable = true;
                   
                }
               
            }
       
            return prepositionAvailable;
       
            // end preposition fetch
        }
       
        SlimParser.prototype.strIsPrepoistion = function(val)
        {
           
            // test if str word is a preposition
           
            let isPreposition = false;
           
            for(let i = 0; i <= this.prepositionList.length - 1; i++)
            {
               
                if(val == this.prepositionList[i])
                {
                       
                    isPreposition = true;
                   
                }
               
            }
       
            return isPreposition;
           
            // end test if str word is a preposition
        };
       
        // This function will return preposition used in command
        SlimParser.prototype.prepositionFetch = function(testArr)
        {
       
            // every command has exactly one preposition or parse error thrown.
           
            let prepositionStr = "";
           
            for(let i = 0; i<= testArr.length - 1; i++)
            {
             
                for(let y = 0; y <= this.prepositionList.length - 1; y++)
                {
                   
                    if(testArr[i] == this.prepositionList[y])
                    {
                           
                        prepositionStr = this.prepositionList[y];
                       
                    }
                   
                }
           
            }
           
            return prepositionStr;
            // end preposition fetch
        };
    Code (markup):
    I have no idea how to write that in regex. Thank you.
     
    Jeremy Benson, Sep 26, 2021 IP
  2. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,732
    Likes Received:
    1,998
    Best Answers:
    253
    Trophy Points:
    515
    #2
    You are ALMOST on the right track, but you're overthinking it.

    It's often pointless to try and force grammatical structures as part of your fixed logic. This is becuase the words used determine the grammatical sequence; thus when you made the "prepositionList" you came REALLY close to the answer.

    Make dictionaries by word type, to detect what type of word the current structure is providing.

    Since you have words or combinations of words you want to look for, do so right there via lookup tables.

    Also, you might want to consider using the new ECMAScript 6 "classes" for this, as a lot of what you're calling should be treated as "static".

    Rather than a verb, I'd start with an action which CAN be a verb, but could be other words. For example "North". That way the user doesn't have to type "go north". North can also be a noun, which as "go north" is the same as just saying "north", but could also be combined with something like "run north", thus it would be listed in both cases.

    This is pretty close to how I did it in a text adventure I wrote for DOS back int he day (Beyond the Pale, a Pilgrim Story) just updated to JavaScript instead of Turbo Pascal 5.5


    
    class CmdParser {
    
    	static ignore = /\b(a|the|an)\b/gi;
    	
    	static directions = [
    		"north", "south", "east", "west", "up", "down"
    	];
    	
    	static items = [
    		"bucket", "eels", "water",
    		"silver key", "gold key", "bronze key", "brass key", "iron key",
    		"fire", "lamp", "fireplace", "tinder box"
    	];
    
    	static dictionary = {
    		action : CmdParser.directions.concat([
    			"go", "run", "get", "take", "put", "place", "pick up", "drop", "light"
    		]),
    		firstArticle : CmdParser.items.concat(CmdParser.directions),
    		secondArticle : CmdParser.items,
    		conjunctions : [ "and", "or" ],
    		verbFunction : [ "on", "in", "atop", "with" ]
    	};
    	
    	static rxStartPhrase = (word) => new RegExp("^" + word + "\\b", "i");
    	
    	constructor(text) {
    		text = text.replace(CmdParser.ignore, "").replace(/\s+/, " ").trim();
    		textLoop: while (text) {
    			for (let [part, lookup] of Object.entries(CmdParser.dictionary)) {
    				if (!this[part]) for (let word of lookup) {
    					if (text.match(CmdParser.rxStartPhrase(word))) {
    						this[part] = word;
    						text = text.substr(word.length + 1);
    						if (text) continue textLoop;
    						else return;
    					}
    				}
    			}
    			text = text.substr(text.indexOf(" ") + 1);
    		}
    	} // CmdParser::constructor
    		
    } // CmdParser
    
    Code (markup):
    So for example:

    
    console.log(new CmdParser("Go North"));
    console.log(new CmdParser("Take Silver Key"));
    console.log(new CmdParser("Light Fireplace with tinder box"));
    
    Code (markup):
    Spits out:

    
    Object { action: "go", firstArticle: "north" }
    Object { action: "take", firstArticle: "silver key" }
    Object { action: "light", firstArticle: "fireplace", verbFunction: "with", secondArticle: "tinder box" }
    
    Code (markup):
    To break down the code, you can see our static containing the various bits of information, using concat to build our "dictionary by type" as apporpriate.

    The constructor first sanitizes the data reducing all internal whitespace and trimming off the external, as well as stripping off any words that should be ignored. Get the ignore words out of the way FIRST.

    We loop so long as there's still text, looking through the dictionary by type (part). If the part is already defined, skip it. Otherwise check for word matches. The word match regex checkes for perfect match from start of "text" to end of word. If found we record it as the current "part", remove it and the space after from "text". If there's still text, "continue" the outermost loop, if not, premature exit via return.

    If the current word has no match in our dictionary, we remove the word and the sapce after, and keep looping.

    Trying to guess grammar before you recognize the words is... well, gonna get you in trouble. Dictionary FIRST.
     
    deathshadow, Oct 2, 2021 IP
  3. Jeremy Benson

    Jeremy Benson Well-Known Member

    Messages:
    364
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    123
    #3
    Hey, thanks for the reply. I need to you to check something out. I got this working pretty good, but there are some problems. I didn't want to have to include items and all that up front. This is very interesting, but may still pose big flaws.

    github.com/JeremyBenson11/text-game-parser
     
    Jeremy Benson, Feb 3, 2022 IP