Repetition
Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.
Making things optional
We can make parts of regex optional using the ?
operator.
/a?/g
[RegExr] [Visual]- 1 match
- 2 matches
a
a
- 3 matches
aa
a
a
- 4 matches
aaa
a
a
a
- 5 matches
aaaa
a
a
a
a
- 6 matches
aaaaa
a
a
a
a
a
Here’s another example:
/https?/g
[RegExr] [Visual]- 1 match
http
http
- 1 match
https
https
- 1 match
http/2
http
- 1 match
shttp
http
- 0 matches
ftp
Here the s
following http
is optional.
We can also make capturing and non-capturing groups optional.
/url: (www\.)?example\.com/g
[RegExr] [Visual]- 1 match
url: example.com
url: example.com
- 1 match
url: www.example.com/foo
url: www.example.com
- 1 match
Here's the url: example.com.
url: example.com
Zero or more
If we wish to match zero or more of a token, we can suffix it with *
.
/a*/g
[RegExr] [Visual]- 1 match
- 2 matches
a
a
- 2 matches
aa
aa
- 2 matches
aaa
aaa
- 2 matches
aaaa
aaaa
- 2 matches
aaaaa
aaaaa
Our regex matches even an empty string ""
.
One or more
If we wish to match one or more of a token, we can suffix it with a +
.
/a+/g
[RegExr] [Visual]- 0 matches
- 1 match
a
a
- 1 match
aa
aa
- 1 match
aaa
aaa
- 1 match
aaaa
aaaa
- 1 match
aaaaa
aaaaa
Exactly x
times
If we wish to match a particular token exactly x
times, we can suffix it with {x}
. This is functionally identical to repeatedly copy-pasting the token x
times.
/a{3}/g
[RegExr] [Visual]- 0 matches
- 0 matches
a
- 0 matches
aa
- 1 match
aaa
aaa
- 1 match
aaaa
aaa
- 1 match
aaaaa
aaa
Here’s an example that matches an uppercase six-character hex colour code.
/#[0-9A-F]{6}/g
[RegExr] [Visual]- 1 match
#AE25AE
#AE25AE
- 1 match
#663399
#663399
- 1 match
How about #73FA79?
#73FA79
- 1 match
Part of #73FA79BAC too
#73FA79
- 0 matches
#FFF
- 0 matches
#a2ca2c
Here, the token {6}
applies to the character class [0-9A-F]
.
Between min
and max
times
If we wish to match a particular token between min
and max
(inclusive) times, we can suffix it with {min,max}
.
/a{2,4}/g
[RegExr] [Visual]- 0 matches
- 0 matches
a
- 1 match
aa
aa
- 1 match
aaa
aaa
- 1 match
aaaa
aaaa
- 1 match
aaaaa
aaaa
There must be no space after the comma in {min,max}
.
At least x
times
If we wish to match a particular token at least x
times, we can suffix it with {x,}
. Think of it as {min,max}
, but without an upper bound.
/a{2,}/g
[RegExr] [Visual]- 0 matches
- 0 matches
a
- 1 match
aa
aa
- 1 match
aaa
aaa
- 1 match
aaaa
aaaa
- 1 match
aaaaa
aaaaa
A note on greediness
Regular expressions, by default, are greedy. They attempt to match as much as possible.
/a*/g
[RegExr] [Visual]- 2 matches
aaaaaa
aaaaaa
/".*"/g
[RegExr] [Visual]- 1 match
"quote"
"quote"
- 1 match
"quote", "quote"
"quote", "quote"
- 1 match
"quote"quote"
"quote"quote"
Suffixing a repetition operator (?
, *
, +
, …) with a ?
, one can make it “lazy”.
/".*?"/g
[RegExr] [Visual]- 1 match
"quote"
"quote"
- 2 matches
"quote", "quote"
"quote"
"quote"
- 1 match
"quote"quote"
"quote"
Here, this could also be achieved by using [^"]
instead of .
(as is best practice).
/"[^"]*"/g
[RegExr] [Visual]- 1 match
"quote"
"quote"
- 2 matches
"quote", "quote"
"quote"
"quote"
- 1 match
"quote"quote"
"quote"
[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more
—Andrew S on StackOverflow
/<.+>/g
[RegExr] [Visual]- 1 match
<em>g r e e d y</em>
<em>g r e e d y</em>
/<.+?>/g
[RegExr] [Visual]- 2 matches
<em>lazy</em>
<em>
</em>
Examples
Bitcoin address
/([13][a-km-zA-HJ-NP-Z0-9]{26,33})/g
[RegExr] [Visual]- 1 match
3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
- 1 match
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
- 1 match
2016-03-09,18f1yugoAJuXcHAbsuRVLQC9TezJ
18f1yugoAJuXcHAbsuRVLQC9TezJ
Youtube Video
/(?:https?:\/\/)?(?:www\.)?youtube\.com\/watch\?.*?v=([^&\s]+).*/gm
[RegExr] [Visual]- 1 match
youtube.com/watch?feature=sth&v=dQw4w9WgXcQ
youtube.com/watch?feature=sth&v=dQw4w9WgXcQ
- 1 match
https://www.youtube.com/watch?v=dQw4w9WgXcQ
https://www.youtube.com/watch?v=dQw4w9WgXcQ
- 1 match
www.youtube.com/watch?v=dQw4w9WgXcQ
www.youtube.com/watch?v=dQw4w9WgXcQ
- 1 match
youtube.com/watch?v=dQw4w9WgXcQ
youtube.com/watch?v=dQw4w9WgXcQ
- 1 match
fakeyoutube.com/watch?v=dQw4w9WgXcQ
youtube.com/watch?v=dQw4w9WgXcQ
We can adjust this to not match the last broken link using anchors, which we shall encounter soon.