Quirks in Javascript regular expressions
I am subscribed to the StackOverflow newsletter and most of the times the "top" questions there are really simple things that gain attention from a lot of people. Today I got one question that I would have thought has an obvious answer, but it did not.
The question was what does "asdf".replace(/.*/g,"x")
return?
And the answer to the question "What does a regular expression replace of everything with x return?" is.... [Ba da bum!] "xx".
The technical answer is there in the StackOverflow question, but I am gonna walk you through some steps to get to understand this the... dumb way.
So, let's try variations on the same theme. What does "asdf".matchAll(/.*/g)
return? Well, first of all, in Chrome, it returns a RegExpStringIterator, which is pretty cool, because it's already using the latest Javascript features and it is returning an iterator rather than an array. But we can just use Array.from
on it to get an array of all matches: for "asdf" and for "".
That's a pretty clear giveaway. Since the regular expression is a global one, it will get a match, then the next one until there is nothing left. First match is "asdf" as expected, the next one is "", which is the rest of the string and which also matches .* Why is it, then, that it doesn't go into a stack overflow (no pun intended) and keep turning up empty strings? Again, it's an algorithm described in an RFC and you need a doctorate in computer science to read it. Well, it's not that complicated, but I did promise a dumb explanation.
And that is that after you get a match on an index, the index is incremented. First match is found at index 0, the next one at 4. There are no matches from index 5 on.
Other variations on this theme are "asdf".matchAll(/.?/g)
, which will return "a","s","d","f","". You can't do "asdf".matchAll(/.*/) , you get a TypeError: undefineds called with a non-global RegExp argument error that really doesn't say much, but you can do "asdf".match(/.*/g) which returns just an array of strings, rather than more complex objects. You can also do
var reg = /.*/g;
console.log(reg.exec("asdf"),reg.exec("asdf"),reg.exec("asdf"),reg.exec("asdf"))
This more classic approach will return "asdf", "", "", "" and it would continue to return empty strings ad infinitum!
But how should one write a regular expression to get what you wanted to get, a replacement of everything with x? /.+/g
would work, but it would not match an empty string. On the other hand, when was the last time you wanted to replace empty strings with anything?
Comments
Be the first to post a comment