Is global match() using capturing parentheses, broken??

Discussion in 'JavaScript' started by lp1051, Nov 14, 2009.

  1. #1
    Hi,

    I recently needed to parse text via JS and I found rather weird behavior of match() method. More exactly when using global match flag together with capturing parentheses. I didn't find any examples of this, and I am not sure if it is something that should be called JS bug, or rather my misunderstanding.

    This is just silly example, and it should match all alphanumeric characters between 'd' and 'tal', in this case 'igi'.
    
    var txt = 'digital point in digital world';
    var matches = txt.match(/d(\w+)tal/gi);
    var match_substring = ?????;
    
    Code (markup):
    I simply don't know how can I call the match substring from matches. If I don't use global flag, it is accessible via matches[1], but with global match there is no index with the captured substring.
    I expected, it should add new dimension to array of matches, so I could call it matches[0][0], matches[1][0] or similar, but nothing like that works.

    Does anyone know why is JS doing so? Is there a way to get the captured substrings without need of walking through the array of matches and calling again the same regular expression without global flag?
    Any hints are welcome.

    Thanks
     
    lp1051, Nov 14, 2009 IP
  2. snuggles

    snuggles Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hey, did you ever get an answer to this, I'm banging my head with the exact same problem.

    LMK! Thanks!
     
    snuggles, Dec 1, 2009 IP
  3. Mike H.

    Mike H. Peon

    Messages:
    219
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #3
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
    <html>
    <head>
    <title>None</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <script type="text/javascript">	
    
    	function init(){
    
    		// match all alphanumeric characters between 'd' and 'tal';
    
    		var nStr = "digital point in diginewtexttal world, d45567tal";	
    		
    		var nResult = [];
    		var nMatch = "";
    	
    		while (/d\w+tal/.test(nStr))
    			{
    			 nMatch = nStr.match(/d\w+tal/);
    			 nResult[nResult.length] = nMatch.toString().replace(/(d)(\w+)(tal)/, "$2");
    			 nStr = nStr.replace(/d\w+tal/, "").replace(/\s{2,}/, " ");			
    			}
    		alert(nResult);		
    		alert(nResult.length);
    		
    	}
    
    	navigator.appName == "Microsoft Internet Explorer" ? attachEvent('onload', init, false) : addEventListener('load', init, false);	
    
    </script>
    </head>
    	<body>
    		
    		
                       
                  
    	</body>
    </html>
    
    Code (markup):
     
    Mike H., Dec 1, 2009 IP
  4. unigogo

    unigogo Peon

    Messages:
    286
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    unigogo, Dec 2, 2009 IP
  5. lp1051

    lp1051 Well-Known Member

    Messages:
    163
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    108
    #5
    Thanks a lot guys,

    Mike H. - yes, this is possible, but I was trying to find other way than looping the matches. Also I believe, it is easier and faster to do :

    
    var nStr = "digital point in diginewtexttal world, d45567tal";	
    		
    var nResult = [];
    var nMatch = nStr.match(/d(\w+)tal/g);
    
    for(var i=0, l=nMatch?nMatch.length:0; i<l; i++)
    {
    	if(nMatch[i]) {
    		nResult.push(nMatch[i].match(/d(\w+)tal/)[1]);
    	}
    }
    alert(nResult);
    
    Code (markup):
    unigogo - yes, this was the answer I was looking for. It is funny, I am using similar method for converting CSS style text into JS style properies, but never thought about it as a possible way to go with matches:) Thanks for the hint! Maybe you can find it useful too, so here it is :
    
    var s = "digital point in diginewtexttal world, d45567tal";
    var res = [];
    
    for(var exp=/d(\w+)tal/; exp.test(s); s=s.replace(exp, function(){res.push(RegExp.$1); return RegExp.$0}));
    
    alert(res);        
    
    Code (markup):
    But there is another problem - $1, ..., $9 for parenthesized substring matches are deprecated as of JavaScript 1.5 (here) and I didn't find yet the new specification. So if anybody know what is the expected future way how to get the submatches from RegExp object, it would be great to share it ;)


    Btw. unigogo, I think you are escaping too much. Why not use only
    
    String.prototype.$1elements=function(vregex) {
    	var elm=[];
    	var str=this;
    	var re= new RegExp(vregex, "g");
    	str = str.replace(re, function($0,$1) {
    		elm.push($1)
    		return $0;		
    	});
    	return elm;
    };
    var str = "border<-top<-width".$1elements("<-(\\w+)");
    alert(str);
    
    Code (markup):
     
    Last edited: Dec 2, 2009
    lp1051, Dec 2, 2009 IP
  6. unigogo

    unigogo Peon

    Messages:
    286
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Yes, lp1051, I should delete the last part of the article.

    Seems I got confused the match and replace method in dealing with HTML tags.
     
    unigogo, Dec 3, 2009 IP
  7. unigogo

    unigogo Peon

    Messages:
    286
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #7
    lp1051,

    Look at the examples on Mozilla site. They still use $1 and $2 in replace method.

    I think the deprecated $1, $2...$9 in the link here should be Regexp.$1, Regexp.$2 ... Regexp.$9
     
    unigogo, Dec 8, 2009 IP