Regular expressions (“regexes”) allow defining a pattern and executing it against strings. Substrings which match the pattern are termed “matches”.
A regular expression is a sequence of characters that define a search pattern.
Regex finds utility in:
Simultaneously, regular expressions are ill-suited for other kinds of problems:
There are several regex implementations—regex engines—each with its own quirks and features. This book will avoid going into the differences between these, instead sticking to features that are, for the most part, common across engines.
The example blocks throughout the book use JavaScript under the hood. As a result, the book may be slightly biased towards JavaScript’s regex engine.
Regular expressions are typically formatted as /<rules>/<flags>
. Often people will drop the slashes and flags for brevity. We’ll get into the details of flags in a later chapter.
Let’s start with the regex /p/g
. For now, please take the g
flag for granted.
/p/g
[RegExr] [Visual]pancake
p
pineapple
p
p
p
apple
p
p
mango
Plum
As we can see, /p/g
matches all lowercase p
characters.
Regular expressions are case-sensitive by default.
Instances of the regex pattern found in an input string are termed “matches”.
/pp/g
[RegExr] [Visual]apple
pp
pineapple
pp
happiness
pp
sipping apple juice
pp
pp
papaya
It’s possible to match a character from within a set of characters.
/[aeiou]/g
[RegExr] [Visual]avocado
a
o
a
o
brinjal
i
a
onion
o
i
o
rhythm
/[aeiou]/g
matches all vowels in our input strings.
Here’s another example of these in action:
/p[aeiou]t/g
[RegExr] [Visual]pat
pat
pet
pet
pit
pit
spat
pat
spot a pet
pot
pet
bat
We match a p
, followed by one of the vowels, followed by a t
.
There’s an intuitive shortcut for matching a character from within a continuous range.
/[a-z]/g
[RegExr] [Visual]john_s
j
o
h
n
s
matej29
m
a
t
e
j
Ayesha?!
y
e
s
h
a
4952
LOUD
The regex /[a-z]/g
matches only one character. In the example above, the strings have several matches each, each one character long. Not one long match.
We can combine ranges and individual characters in our regexes.
/[A-Za-z0-9_-]/g
[RegExr] [Visual]john_s
j
o
h
n
_
s
matej29
m
a
t
e
j
2
9
Ayesha?!
A
y
e
s
h
a
4952
4
9
5
2
LOUD
L
O
U
D
Our regex /[A-Za-z0-9_-]/g
matches a single character, which must be (at least) one of the following:
A-Z
a-z
0-9
_
and -
.We can also “negate” these rules:
/[^aeiou]/g
[RegExr] [Visual]Umbrella
U
m
b
r
l
l
cauliflower
c
l
f
l
w
r
ou
The only difference between the first regex of this chapter and /[^aeiou]/g
is the ^
immediately after the opening bracket. Its purpose is to negate the rules defined within the brackets. We are now saying:
“match any character that is not any of
a
,e
,i
,o
, andu
”
/[^a-zA-Z_0-9-]/g
[RegExr] [Visual]TheLegend27
WaterGuy12
Smokie_Bear
Robert'); DROP TABLE Students;--
'
)
;
;
/[A-HJ-NP-Za-kmnp-z2-9]/g
[RegExr] [Visual]foo
f
lily
i
y
lI0O1
unambiguity
u
n
a
m
b
i
g
u
i
t
y
Character escapes act as shorthands for some common character classes.
\d
The character escape \d
matches digit characters, from 0
to 9
. It is equivalent to the character class [0-9]
.
/\d/g
[RegExr] [Visual]2020
2
0
2
0
100/100
1
0
0
1
0
0
It costs $5.45
5
4
5
3.14159
3
1
4
1
5
9
/\d\d/g
[RegExr] [Visual]2020
20
20
100/100
10
10
It costs $5.45
45
3.14159
14
15
\D
is the negation of \d
and is equivalent to [^0-9]
.
/\D/g
[RegExr] [Visual]2020
100/100
/
It costs $5.45
I
t
c
o
s
t
s
$
.
3.14159
.
\w
The escape \w
matches characters deemed “word characters”. These include:
a
–z
A
–Z
0
–9
_
It is thus equivalent to the character class [a-zA-Z0-9_]
.
/\w/g
[RegExr] [Visual]john_s
j
o
h
n
_
s
matej29
m
a
t
e
j
2
9
Ayesha?!
A
y
e
s
h
a
4952
4
9
5
2
LOUD
L
O
U
D
lo-fi
l
o
f
i
get out
g
e
t
o
u
t
21*2 = 42(1)
2
1
2
4
2
1
/\W/g
[RegExr] [Visual]john_s
Ayesha?!
?
!
4952
LOUD
lo-fi
-
get out
;-;
;
-
;
21*2 = 42(1)
*
=
(
)
\s
The escape \s
matches whitespace characters. The exact set of characters matched is dependent on the regex engine, but most include at least:
\t
\r
\n
\f
Many also include vertical tabs (\v
). Unicode-aware engines usually match all characters in the separator
category.
The technicalities, however, will usually not be important.
/\s/g
[RegExr] [Visual]word word
tabs vs spaces
snake_case.jpg
/\S/g
[RegExr] [Visual]word word
w
o
r
d
w
o
r
d
tabs vs spaces
t
a
b
s
v
s
s
p
a
c
e
s
snake_case.jpg
s
n
a
k
e
_
c
a
s
e
.
j
p
g
.
While not a typical character escape, .
matches any1 character.
/./g
[RegExr] [Visual]john_s
j
o
h
n
_
s
Ayesha?!
A
y
e
s
h
a
?
!
4952
4
9
5
2
LOUD
L
O
U
D
lo-fi
l
o
-
f
i
get out
g
e
t
o
u
t
;-;
;
-
;
21*2 = 42(1)
2
1
*
2
=
4
2
(
1
)
\n
. This can be changed using the “dotAll” flag, if supported by the regex engine in question.↩In regex, some characters have special meanings as we will explore across the chapters:
|
{
, }
(
, )
[
, ]
^
, $
+
, *
, ?
\
.
— Literal only within character classes.1-
— Sometimes a special character within character classes.When we wish to match these characters literally, we need to “escape” them.
This is done by prefixing the character with a \
.
/\(paren\)/g
[RegExr] [Visual]paren
parents
(paren)
(paren)
a (paren)
(paren)
/(paren)/g
[RegExr] [Visual]paren
paren
parents
paren
(paren)
paren
a (paren)
paren
/example\.com/g
[RegExr] [Visual]example.com
example.com
a.example.com/foo
example.com
example_com
example@com
example_com/foo
/example.com/g
[RegExr] [Visual]example.com
example.com
a.example.com/foo
example.com
example_com
example_com
example@com
example@com
example_com/foo
example_com
/A\+/g
[RegExr] [Visual]A+
A+
A+B
A+
5A+
A+
AAA
/A+/g
[RegExr] [Visual]A+
A
A+B
A
5A+
A
AAA
AAA
/worth \$5/g
[RegExr] [Visual]worth $5
worth $5
worth $54
worth $5
not worth $5
worth $5
/worth $5/g
[RegExr] [Visual]worth $5
worth $54
not worth $5
/\/\/.*/g
[RegExr] [Visual]console.log(); // comment
// comment
console.log(); // // comment
// // comment
console.log();
/\*[^\*]*\*/g
[RegExr] [Visual]here be *italics*
*italics*
permitted**
**
a*b*c*d
*b*
a*b*c*d*e
*b*
*d*
a️bcd
The first and last asterisks are literal since they are escaped — \*
.
The asterisk inside the character class does not necessarily need to be escaped1, but I’ve escaped it anyway for clarity.
The asterisk immediately following the character class indicates repetition of the character class, which we’ll explore in chapters that follow.
Groups, as the name suggests, are meant to be used to “group” components of regular expressions. These groups can be used to:
We’ll see how to do a lot of this in later chapters, but learning how groups work will allow us to study some great examples in these later chapters.
Capturing groups are denoted by (
… )
. Here’s an expository example:
/a(bcd)e/g
[RegExr] [Visual]abcde
abcde
abcdefg?
abcde
abcde
abcde
Capturing groups allow extracting parts of matches.
/\{([^{}]*)\}/g
[RegExr] [Visual]{braces}
{braces}
{two} {pairs}
{two}
{pairs}
{ {nested} }
{nested}
{ incomplete } }
{ incomplete }
{}
{}
{unmatched
Using your language’s regex functions, you would be able to extract the text between the matched braces for each of these strings.
Capturing groups can also be used to group regex parts for ease of repetition of said group. While we will cover repetition in detail in chapters that follow, here’s an example that demonstrates the utility of groups.
/a(bcd)+e/g
[RegExr] [Visual]abcdefg
abcde
abcdbcde
abcdbcde
abcdbcdbcdef
abcdbcdbcde
ae
Other times, they are used to group logically similar parts of the regex for readability.
/(\d\d\d\d)-W(\d\d)/g
[RegExr] [Visual]2020-W12
2020-W12
1970-W01
1970-W01
2050-W50-6
2050-W50
12050-W50
2050-W50
Backreferences allow referring to previously captured substrings.
The match from the first group would be \1
, that from the second would be \2
, and so on…
/([abc])=\1=\1/g
[RegExr] [Visual]a=a=a
a=a=a
ab=b=b
b=b=b
a=b=c
Backreferences cannot be used to reduce duplication in regexes. They refer to the match of groups, not the pattern.
/[abc][abc][abc]/g
[RegExr] [Visual]abc
abc
a cable
cab
aaa
aaa
bbb
bbb
ccc
ccc
/([abc])\1\1/g
[RegExr] [Visual]abc
a cable
aaa
aaa
bbb
bbb
ccc
ccc
Here’s an example that demonstrates a common use-case:
/\w+([,|])\w+\1\w+/g
[RegExr] [Visual]comma,separated,values
comma,separated,values
pipe|separated|values
pipe|separated|values
wb|mixed,delimiters
wb,mixed|delimiters
This cannot be achieved with a repeated character classes.
/\w+[,|]\w+[,|]\w+/g
[RegExr] [Visual]comma,separated,values
comma,separated,values
pipe|separated|values
pipe|separated|values
wb|mixed,delimiters
wb|mixed,delimiters
wb,mixed|delimiters
wb,mixed|delimiters
Non-capturing groups are very similar to capturing groups, except that they don’t create “captures”. They take the form (?:
… )
.
Non-capturing groups are usually used in conjunction with capturing groups. Perhaps you are attempting to extract some parts of the matches using capturing groups. You may wish to use a group without messing up the order of the captures. This is where non-capturing groups come handy.
/^\?(\w+)=(\w+)(?:&(\w+)=(\w+))*$/g
[RegExr] [Visual]
?
?a=b
?a=b
?a=b&foo=bar
?a=b&foo=bar
We match the first key-value pair separately because that allows us to use &
, the separator, as part of the repeating group.
As a rule of thumb, do not use regex to match XML/HTML.1234
However, it’s a relevant example:
/<([a-z]+)+>(.*)<\/\1>/gi
[RegExr] [Visual]<p>paragraph</p>
<p>paragraph</p>
<li>list item</li>
<li>list item</li>
<p><span>nesting</span></p>
<p><span>nesting</span></p>
<p>hmm</li>
<p><p>not clever</p></p></p>
<p><p>not clever</p></p></p>
Find: \b(\w+) (\w+)\b
Replace: $2, $1
5
John Doe
Jane Doe
Sven Svensson
Janez Novak
Janez Kranjski
Tim Joe
Doe, John
Doe, Jane
Svensson, Sven
Novak, Janez
Kranjski, Janez
Joe, Tim
Find: \bword(s?)\b
Replace: phrase$1
5
This is a paragraph with some words.
Some instances of the word "word" are in their plural form: "words".
Yet, some are in their singular form: "word".
This is a paragraph with some phrases.
Some instances of the phrase "phrase" are in their plural form: "phrases".
Yet, some are in their singular form: "phrase".
$1
, $2
, … are usually used in place of \1
, \2
, … to refer to captured strings.↩Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.
We can make parts of regex optional using the ?
operator.
/a?/g
[RegExr] [Visual]
a
a
aa
a
a
aaa
a
a
a
aaaa
a
a
a
a
aaaaa
a
a
a
a
a
Here’s another example:
/https?/g
[RegExr] [Visual]http
http
https
https
http/2
http
shttp
http
ftp
Here the s
following http
is optional.
We can also make capturing and non-capturing groups optional.
/url: (www\.)?example\.com/g
[RegExr] [Visual]url: example.com
url: example.com
url: www.example.com/foo
url: www.example.com
Here's the url: example.com.
url: example.com
If we wish to match zero or more of a token, we can suffix it with *
.
/a*/g
[RegExr] [Visual]
a
a
aa
aa
aaa
aaa
aaaa
aaaa
aaaaa
aaaaa
Our regex matches even an empty string ""
.
If we wish to match one or more of a token, we can suffix it with a +
.
/a+/g
[RegExr] [Visual]
a
a
aa
aa
aaa
aaa
aaaa
aaaa
aaaaa
aaaaa
x
timesIf we wish to match a particular token exactly x
times, we can suffix it with {x}
. This is functionally identical to repeatedly copy-pasting the token x
times.
/a{3}/g
[RegExr] [Visual]
a
aa
aaa
aaa
aaaa
aaa
aaaaa
aaa
Here’s an example that matches an uppercase six-character hex colour code.
/#[0-9A-F]{6}/g
[RegExr] [Visual]#AE25AE
#AE25AE
#663399
#663399
How about #73FA79?
#73FA79
Part of #73FA79BAC too
#73FA79
#FFF
#a2ca2c
Here, the token {6}
applies to the character class [0-9A-F]
.
min
and max
timesIf we wish to match a particular token between min
and max
(inclusive) times, we can suffix it with {min,max}
.
/a{2,4}/g
[RegExr] [Visual]
a
aa
aa
aaa
aaa
aaaa
aaaa
aaaaa
aaaa
There must be no space after the comma in {min,max}
.
x
timesIf we wish to match a particular token at least x
times, we can suffix it with {x,}
. Think of it as {min,max}
, but without an upper bound.
/a{2,}/g
[RegExr] [Visual]
a
aa
aa
aaa
aaa
aaaa
aaaa
aaaaa
aaaaa
Regular expressions, by default, are greedy. They attempt to match as much as possible.
/a*/g
[RegExr] [Visual]aaaaaa
aaaaaa
/".*"/g
[RegExr] [Visual]"quote"
"quote"
"quote", "quote"
"quote", "quote"
"quote"quote"
"quote"quote"
Suffixing a repetition operator (?
, *
, +
, …) with a ?
, one can make it “lazy”.
/".*?"/g
[RegExr] [Visual]"quote"
"quote"
"quote", "quote"
"quote"
"quote"
"quote"quote"
"quote"
Here, this could also be achieved by using [^"]
instead of .
(as is best practice).
/"[^"]*"/g
[RegExr] [Visual]"quote"
"quote"
"quote", "quote"
"quote"
"quote"
"quote"quote"
"quote"
[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more
—Andrew S on StackOverflow
/<.+>/g
[RegExr] [Visual]<em>g r e e d y</em>
<em>g r e e d y</em>
/<.+?>/g
[RegExr] [Visual]<em>lazy</em>
<em>
</em>
/([13][a-km-zA-HJ-NP-Z0-9]{26,33})/g
[RegExr] [Visual]3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
2016-03-09,18f1yugoAJuXcHAbsuRVLQC9TezJ
18f1yugoAJuXcHAbsuRVLQC9TezJ
/(?:https?:\/\/)?(?:www\.)?youtube\.com\/watch\?.*?v=([^&\s]+).*/gm
[RegExr] [Visual]youtube.com/watch?feature=sth&v=dQw4w9WgXcQ
youtube.com/watch?feature=sth&v=dQw4w9WgXcQ
https://www.youtube.com/watch?v=dQw4w9WgXcQ
https://www.youtube.com/watch?v=dQw4w9WgXcQ
www.youtube.com/watch?v=dQw4w9WgXcQ
www.youtube.com/watch?v=dQw4w9WgXcQ
youtube.com/watch?v=dQw4w9WgXcQ
youtube.com/watch?v=dQw4w9WgXcQ
fakeyoutube.com/watch?v=dQw4w9WgXcQ
youtube.com/watch?v=dQw4w9WgXcQ
We can adjust this to not match the last broken link using anchors, which we shall encounter soon.
Alternation allows matching one of several phrases. This is more powerful than character classes, which are limited to characters.
Delimit the set of phrases with pipes—|
.
/foo|bar|baz/g
[RegExr] [Visual]foo baz
foo
baz
Your food
foo
Behind bars
bar
One of
foo
,bar
, andbaz
If only a part of the regex is to be “alternated”, wrap that part with a group—capturing or non-capturing.
/Try (foo|bar|baz)/g
[RegExr] [Visual]Try foo
Try foo
Try bar
Try bar
Try baz
Try baz
Try food
Try foo
Try
followed by one offoo
,bar
, andbaz
Matching numbers between 100 and 250:
/1\d\d|2[0-4]\d|250/g
[RegExr] [Visual]100, 157, 199
100
157
199
139 + 140 = 279
139
140
201 INR
201
$220
220
250
250
1250
125
e = 2.71828182...
182
182
251
729
This can be generalized to match arbitrary number ranges!
Let’s improve one of our older examples to also match shorthand hex colours.
/#([0-9A-F]{6}|[0-9A-F]{3})/g
[RegExr] [Visual]#AE25AE
#AE25AE
#663399
#663399
How about #73FA79?
#73FA79
Part of #73FA79BAC too
#73FA79
#FFF
#FFF
#a2ca2c
It is important that [0-9A-F]{6}
comes before [0-9A-F]{3}
. Else:
/#([0-9A-F]{3}|[0-9A-F]{6})/g
[RegExr] [Visual]#AE25AE
#AE2
#663399
#663
How about #73FA79?
#73F
Part of #73FA79BAC too
#73F
#FFF
#FFF
#a2ca2c
Regex engines try alternatives from the left to the right.
/^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$/g
[RegExr] [Visual]MMXX
MMXX
VI
VI
XX
XX
XI
XI
IXI
VV
Flags (or “modifiers”) allow us to put regexes into different “modes”.
Flags are the part after the final /
in /pattern/
.
Different engines support different flags. We’ll explore some of the most common flags here.
g
)All examples thus far have had the global flag. When the global flag isn’t enabled, the regex doesn’t match anything beyond the first match.
/[aeiou]/g
[RegExr] [Visual]corona
o
o
a
cancel
a
e
rhythm
/[aeiou]/
[RegExr] [Visual]corona
o
cancel
a
rhythm
i
)As the name suggests, enabling this flag makes the regex case-insensitive in its matching.
/#[0-9A-F]{6}/i
[RegExr] [Visual]#AE25AE
#AE25AE
#663399
#663399
Even #a2ca2c?
#a2ca2c
#FFF
/#[0-9A-F]{6}/
[RegExr] [Visual]#AE25AE
#AE25AE
#663399
#663399
Even #a2ca2c?
#FFF
/#[0-9A-Fa-f]{6}/
[RegExr] [Visual]#AE25AE
#AE25AE
#663399
#663399
Even #a2ca2c?
#a2ca2c
#FFF
m
)In Ruby, the m
flag performs other functions.
The multiline flag has to do with the regex’s handling of anchors when dealing with “multiline” strings—strings that include newlines (\n
). By default, the regex /^foo$/
would match only "foo"
.
We might want it to match foo
when it is in a line by itself in a multiline string.
Let’s take the string "bar\nfoo\nbaz"
as an example:
bar
foo
baz
Without the multiline flag, the string above would be considered as a single line bar\nfoo\nbaz
for matching purposes. The regex ^foo$
would thus not match anything.
With the multiline flag, the input would be considered as three “lines”: bar
, foo
, and baz
. The regex ^foo$
would match the line in the middle—foo
.
s
)JavaScript, prior to ES2018, did not support this flag. Ruby does not support the flag, instead using m
for the same.
The .
typically matches any character except newlines. With the dot-all flag, it matches newlines too.
u
)In the presence of the u
flag, the regex and the input string will be interpreted in a unicode-aware way. The details of this are implementation-dependent, but here are some things to expect:
i
flag may use Unicode’s case-folding logic.x
)When this flag is set, whitespace in the pattern is ignored (unless escaped or in a character class). Additionally, characters following #
on any line are ignored. This allows for comments and is useful when writing complex patterns.
Here’s an example from Advanced Examples, formatted to take advantage of the whitespace extended flag:
^ # start of line
(
[+-]? # sign
(?=\.\d|\d) # don't match `.`
(?:\d+)? # integer part
(?:\.?\d*) # fraction part
)
(?: # optional exponent part
[eE]
(
[+-]? # optional sign
\d+ # power
)
)?
$ # end of line
Anchors do not match anything by themselves. Instead, they place restrictions on where matches may appear—“anchoring” matches.
You could also think about anchors as “invisible characters”.
^
Marked by a caret (^
) at the beginning of the regex, this anchor makes it necessary for the rest of the regex to match from the beginning of the string.
You can think of it as matching an invisible character always present at the beginning of the string.
/^p/g
[RegExr] [Visual]photoshop
p
pineapple
p
tap
apple
ppap
p
mango
$
This anchor is marked by a dollar ($
) at the end of the regex. It is analogous to the beginning of the line anchor.
You can think of it as matching an invisible character always present at the end of the string.
/p$/g
[RegExr] [Visual]photoshop
p
pineapple
apple
app
p
Plum
mango
The ^
and $
anchors are often used in conjunction to ensure that the regex matches the entirety of the string, rather than merely a part.
/^p$/g
[RegExr] [Visual]p
p
pi
pea
tarp
apple
Let’s revisit an example from Repetition, and add the two anchors at the ends of the regex.
/^https?$/g
[RegExr] [Visual]http
http
https
https
http/2
shttp
ftp
In the absence of the anchors, http/2
and shttp
would also match.
\b
A word boundary is a position between a word character and a non-word character.
The word boundary anchor, \b
, matches an imaginary invisible character that exists between consecutive word and non-word characters.
/\bp/g
[RegExr] [Visual]peach
p
banana, peach
p
banana+peach
p
banana-peach
p
banana_peach
banana%20peach
grape
Words characters include a-z
, A-Z
, 0-9
, and _
.
/\bp\b/g
[RegExr] [Visual]word p word
p
(p)
p
p+q+r
p
(paren)
(loop)
loops
/\bcat\b/g
[RegExr] [Visual]cat
cat
the cat?
cat
catch
concat it
concatenate
There is also a non-word-boundary anchors: \B
.
As the name suggests, it matches everything apart from word boundaries.
/\Bp/g
[RegExr] [Visual]ape
p
leap
p
(leap)
p
a pot
pea
/\Bp\B/g
[RegExr] [Visual]ape
p
_peel
p
leap
(leap)
a pot
pea
^…$
and \b…\b
are common patterns and you will almost always need one or the other to prevent accidental matches.
/\s+$/gm
[RegExr] [Visual]abc
def
abc def
/^## /gm
[RegExr] [Visual]# Heading 1
## Heading 2
##
### Heading 3
#### Heading 4
Without anchors:
/## /gm
[RegExr] [Visual]# Heading 1
## Heading 2
##
### Heading 3
##
#### Heading 4
##
This section is a Work In Progress.
Lookarounds can be used to verify conditions, without matching any text.
You’re only looking, not moving.
(?=…)
(?!…)
(?<=…)
(?<!…)
/_(?=[aeiou])/g
[RegExr] [Visual]_a
_
e_e
_
_f
Note how the character following the _
isn’t matched. Yet, its nature is confirmed by the positive lookahead.
/(.+)_(?=[aeiou])(?=\1)/g
[RegExr] [Visual]e_e
e_
u_u
u_
uw_uw
uw_
uw_uwa
uw_
f_f
a_e
After (?=[aeiou])
, the regex engine hasn’t moved and checks for (?=\1)
starting after the _
.
/(?=.*#).*/g
[RegExr] [Visual]abc#def
abc#def
#def
#def
abc#
abc#
abcdef
/_(?![aeiou])/g
[RegExr] [Visual]_a
e_e
_f
_
/^(?!.*#).*$/g
[RegExr] [Visual]abc#def
#def
abc#
abcdef
abcdef
Without the anchors, this will match the part without the #
in each test case.
Negative lookaheads are commonly used to prevent particular phrases from matching.
/foo(?!bar)/g
[RegExr] [Visual]foobaz
foo
foobarbaz
bazfoobar
/---(?:(?!---).)*---/g
[RegExr] [Visual]---foo---
---foo---
---fo-o---
---fo-o---
--------
------
JavaScript, prior to ES2018, did not support this flag.
/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[a-zA-Z]).{8,}$/
[RegExr] [Visual]hunter2
zsofpghedake
zsofpghedak4e
zSoFpghEdaK4E
zSoFpghEdaK4E
zSoFpg!hEd!aK4E
zSoFpg!hEd!aK4E
Lookarounds can be used verify multiple conditions.
/(['"])(?:(?!\1).)*\1/g
[RegExr] [Visual]foo "bar" baz
"bar"
foo 'bar' baz
'bar'
foo 'bat's' baz
'bat'
foo "bat's" baz
"bat's"
foo 'bat"s' baz
'bat"s'
Without lookaheads, this is the best we can do:
/(['"])[^'"]*\1/g
[RegExr] [Visual]foo "bar" baz
"bar"
foo 'bar' baz
'bar'
foo 'bat's' baz
'bat'
foo "bat's" baz
foo 'bat"s' baz
/\/\*[\s\S]*?\*\/|\/\/.*/g
[RegExr] [Visual]const a = 0; // comment
// comment
/* multiline */
/* multiline */
[\s\S]
is a hack to match any character including newlines. We avoid the dot-all flag because we need to use the ordinary .
for single-line comments.
/^([01]?[0-9]|2[0-3]):[0-5][0-9](:[0-5][0-9])?$/g
[RegExr] [Visual]23:59:00
23:59:00
14:00
14:00
23:00
23:00
29:00
32:32
/<Example source="(.*?)" flags="(.*?)">/gm
[RegExr] [Visual]<Example source="p[aeiou]t" flags="g">
<Example source="p[aeiou]t" flags="g">
<Example source="s+$" flags="gm">
<Example source="s+$" flags="gm">
<Example source="(['"])(?:(?!\1).)*\1" flags="g">
<Example source="(['"])(?:(?!\1).)*\1" flags="g">
<Example source='s+$' flags='gm'>
</Example>
Replace: <Example regex={/$1/$2}>
I performed this operation in commit d7a684f
.
/^([+-]?(?=\.\d|\d)(?:\d+)?(?:\.?\d*))(?:[eE]([+-]?\d+))?$/g
[RegExr] [Visual]987
987
-8
-8
0.1
0.1
2.
2.
.987
.987
+4.0
+4.0
1.1e+1
1.1e+1
1.e+1
1.e+1
1e2
1e2
0.2e2
0.2e2
.987e2
.987e2
+4e-1
+4e-1
-8.e+2
-8.e+2
.
The positive lookahead (?=\.\d|\d)
ensures that the regex does not match .
.
/^((-?|\+?)?\d+(\.\d+)?),\s*((-?|\+?)?\d+(\.\d+)?)$/g
[RegExr] [Visual]30.0260736, -89.9766792
30.0260736, -89.9766792
45, 180
45, 180
-90.000, -180.0
-90.000, -180.0
48.858093,2.294694
48.858093,2.294694
-3.14, 3.14
-3.14, 3.14
045, 180.0
045, 180.0
0, 0
0, 0
-90., -180.
.004, .15
See also: Floating Point Numbers
/^[a-f0-9]{2}(:[a-f0-9]{2}){5}$/i
[RegExr] [Visual]01:02:03:04:ab:cd
01:02:03:04:ab:cd
9E:39:23:85:D8:C2
9E:39:23:85:D8:C2
00:00:00:00:00:00
00:00:00:00:00:00
1N:VA:L1:DA:DD:R5
9:3:23:85:D8:C2
ac::23:85:D8:C2
/[\da-f]{8}-([\da-f]{4}-){3}[\da-f]{12}/i
[RegExr] [Visual]123e4567-e89b-12d3-a456-426655440000
123e4567-e89b-12d3-a456-426655440000
c73bcdcc-2669-4bf6-81d3-e4ae73fb11fd
c73bcdcc-2669-4bf6-81d3-e4ae73fb11fd
C73BCDCC-2669-4Bf6-81d3-E4AE73FB11FD
C73BCDCC-2669-4Bf6-81d3-E4AE73FB11FD
c73bcdcc-2669-4bf6-81d3-e4an73fb11fd
c73bcdcc26694bf681d3e4ae73fb11fd
/\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b/g
[RegExr] [Visual]9.9.9.9
9.9.9.9
127.0.0.1:8080
127.0.0.1
It's 192.168.1.9
192.168.1.9
255.193.09.243
255.193.09.243
123.123.123.123
123.123.123.123
123.123.123.256
0.0.x.0
0
to 360
360
300
to 359
— 3
, [0-5]
, any digit0
to 299
1
or 2
as the hundreds digit/^0*(?:360|3[0-5]\d|[12]?\d?\d)$/g
[RegExr] [Visual]360
360
349
349
235
235
152
152
68
68
9
9
361
404
100
, optionally followed by .000…
/^(?:100(?:\.0+)?|\d?\d(?:\.\d+)?)%$/g
[RegExr] [Visual]100%
100%
100.0%
100.0%
25%
25%
52.32%
52.32%
9%
9%
0.5%
0.5%
100.5%
42
/^hsl\(\s*0*(?:360|3[0-5]\d|[12]?\d?\d)\s*(?:,\s*0*(?:100(?:\.0+)?|\d?\d(?:\.\d+)?)%\s*){2}\)$/gi
[RegExr] [Visual]hsl(0,20%,100%)
hsl(0,20%,100%)
HSL(0350, 002%,4.1%)
HSL(0350, 002%,4.1%)
hsl(360,10% , 0.2% )
hsl(360,10% , 0.2% )
Congratulations on getting this far!
Obligatory xkcd:
If you’d like to read more about regular expressions and how they work:
regex
tag on StackOverflowThanks for reading!