Careful when reusing Javascript RegExp objects
I had this operation on a Javascript object that was using a complex regular expression to test for something. Usually, when you want to do that, you use the regular expression inline or as a local variable. However, given the complexity of the expression I thought it would be more efficient to cache the object and reuse it anytime.
Now, there are two gotchas when using regular expressions in Javascript. One of them is that if you want to match on a string multiple times, you need to use the global flag. For example the code
The second gotcha is that if you use the global flag, the lastIndex property of the RegExp object remains unchanged for the next match. So a code like this:
The problem is that the solution to the first gotcha leads to the second like in my case. I used the RegExp object as a field in my object, then I used it repeatedly to test for a pattern in more strings. It would work once, then fail, then work again. Once I removed the global flag, it all worked like a charm.
The moral of the story is to be careful of constructs like _reg.test(input);
when _reg is a global regular expression. It will attempt to match from the index of the last match in any previous string.
Also, in order to use a global RegExp multiple times without redeclaring it every time, one can just manually reset the lastIndex property : reg.lastIndex=0;
Update: Here is a case that was totally weird. Imagine a javascript function that returns an array of strings based on a regular expression match inside a for loop. In FireFox it would return half the number of items that it should have. If one would enter FireBug and place a breakpoint in the loop, the list would be OK! If the breakpoint were to be placed outside the loop, the bug would occur. Here is the code. Try to see what is wrong with it:
Now, there are two gotchas when using regular expressions in Javascript. One of them is that if you want to match on a string multiple times, you need to use the global flag. For example the code
var reg=new RegExp('a',''); //the same as: var reg=/a/;will alert 'baa', because after the first match and replace, the RegExp object returns from the replace operation. That is why I normally use the global flag on all my regular expressions like this:
alert('aaa'.replace(reg,'b'));
var reg=new RegExp('a','g'); //the same as: var reg=/a/g;(alerts 'bbb')
alert('aaa'.replace(reg,'b'));
The second gotcha is that if you use the global flag, the lastIndex property of the RegExp object remains unchanged for the next match. So a code like this:
var reg=new RegExp('a',''); //same as: /a/;will alert 0 both times. Using the global flag will lead to alerting 1 and 2.
reg.test('aaa');
alert(reg.lastIndex);
reg.test('aaa');
alert(reg.lastIndex);
The problem is that the solution to the first gotcha leads to the second like in my case. I used the RegExp object as a field in my object, then I used it repeatedly to test for a pattern in more strings. It would work once, then fail, then work again. Once I removed the global flag, it all worked like a charm.
The moral of the story is to be careful of constructs like _reg.test(input);
when _reg is a global regular expression. It will attempt to match from the index of the last match in any previous string.
Also, in order to use a global RegExp multiple times without redeclaring it every time, one can just manually reset the lastIndex property : reg.lastIndex=0;
Update: Here is a case that was totally weird. Imagine a javascript function that returns an array of strings based on a regular expression match inside a for loop. In FireFox it would return half the number of items that it should have. If one would enter FireBug and place a breakpoint in the loop, the list would be OK! If the breakpoint were to be placed outside the loop, the bug would occur. Here is the code. Try to see what is wrong with it:
types.forEach(function (type) {
if (type && type.name) {
var m = /(\{tag_.*\})/ig.exec(type.name);
// type is tag
if (m && m.length) {
typesDict[type.name] = m[1];
}
}
});
Comments
Are you 1. using a regex with the `g` modifier 2. using the same instance for checking both strings? If so, this is intentional JavaScript behavior. Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test#Using_test()_on_a_regex_with_the_global_flag which states: > It is worth noting that the `lastIndex` will not reset when testing a different string. I.e., the following is expected output: https://gist.github.com/binki/fce88551d19a6adbf55b495b70adbb6f You must share a minimal repro of your code for us to properly determine whether it is a bug or expected behavior.
Nathan Phillip BrinkThanks, you helped so much. I was trying with 2 different strings A and B... WTF it don't match the B string? I thought it was a problem with my regex. Then I put the B string first and it didn't match A... I can confirm this bug in Node 8.10.0
Bambino VJI can't reproduce it either with 57.0.4 (Quantum). Maybe they fixed it.
SideriteI can’t repro it in 58.0b16. Is it fixed? Is there a bugzilla number for this? EDIT: My repro attempt is: https://jsfiddle.net/binki/4ru6x0yu/4/ , maybe I need a bigger list to turn on optimizations, etc. I am running it with developer tools closed and only checking the console by opening them later (having developer tools open, in my understanding, causes JavaScript to run in much slower debug mode with less optimizations?).
Nathan Phillip BrinkI am amazed it is still reproduced. I witnessed a presentation about the many optimizations in Firefox and it is quite difficult nowadays to trust the flow of the instructions you write anymore. For example they check if a function has side effects and if not, they move it out of loops. So you do some loop to see how heavy it is, and it only gets run once, stuff like that.
SideriteThank you! Helped me out. What a stupid bug.
AndrewAwesome. Super thanks. I was stuck on this just now and the lastIndex=0 was exactly what I needed!
adam tomblesonAs I began to narrow in on a bug I had previously noticed, I quickly recognized the behavior and your article was the first result on a said for this behavior quickly confirming my suspicion and resolving the issue. Thanks!
SpiderMattThanks! I was struggling with this issue about an hour until I figured out that it was something wrong with regexp itself.
Anonymous