Regular Expressions For Regular Folk

Repetition

Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.

Making things optional

We can make parts of regex optional using the ? operator.

/a?/g[RegExr] [Visual]
  • 1 match
  • 2 matchesa
    1. a
  • 3 matchesaa
    1. a
    2. a
  • 4 matchesaaa
    1. a
    2. a
    3. a
  • 5 matchesaaaa
    1. a
    2. a
    3. a
    4. a
  • 6 matchesaaaaa
    1. a
    2. a
    3. a
    4. a
    5. a

Here’s another example:

/https?/g[RegExr] [Visual]
  • 1 matchhttp
    1. http
  • 1 matchhttps
    1. https
  • 1 matchhttp/2
    1. http
  • 1 matchshttp
    1. http
  • 0 matchesftp

    Here the s following http is optional.

    We can also make capturing and non-capturing groups optional.

    /url: (www\.)?example\.com/g[RegExr] [Visual]
    • 1 matchurl: example.com
      1. url: example.com
    • 1 matchurl: www.example.com/foo
      1. url: www.example.com
    • 1 matchHere's the url: example.com.
      1. url: example.com

    Zero or more

    If we wish to match zero or more of a token, we can suffix it with *.

    /a*/g[RegExr] [Visual]
    • 1 match
    • 2 matchesa
      1. a
    • 2 matchesaa
      1. aa
    • 2 matchesaaa
      1. aaa
    • 2 matchesaaaa
      1. aaaa
    • 2 matchesaaaaa
      1. aaaaa

    Our regex matches even an empty string "".

    One or more

    If we wish to match one or more of a token, we can suffix it with a +.

    /a+/g[RegExr] [Visual]
    • 0 matches
      1. 1 matcha
        1. a
      2. 1 matchaa
        1. aa
      3. 1 matchaaa
        1. aaa
      4. 1 matchaaaa
        1. aaaa
      5. 1 matchaaaaa
        1. aaaaa

      Exactly x times

      If we wish to match a particular token exactly x times, we can suffix it with {x}. This is functionally identical to repeatedly copy-pasting the token x times.

      /a{3}/g[RegExr] [Visual]
      • 0 matches
        1. 0 matchesa
          1. 0 matchesaa
            1. 1 matchaaa
              1. aaa
            2. 1 matchaaaa
              1. aaa
            3. 1 matchaaaaa
              1. aaa

            Here’s an example that matches an uppercase six-character hex colour code.

            /#[0-9A-F]{6}/g[RegExr] [Visual]
            • 1 match#AE25AE
              1. #AE25AE
            • 1 match#663399
              1. #663399
            • 1 matchHow about #73FA79?
              1. #73FA79
            • 1 matchPart of #73FA79BAC too
              1. #73FA79
            • 0 matches#FFF
              1. 0 matches#a2ca2c

                Here, the token {6} applies to the character class [0-9A-F].

                Between min and max times

                If we wish to match a particular token between min and max (inclusive) times, we can suffix it with {min,max}.

                /a{2,4}/g[RegExr] [Visual]
                • 0 matches
                  1. 0 matchesa
                    1. 1 matchaa
                      1. aa
                    2. 1 matchaaa
                      1. aaa
                    3. 1 matchaaaa
                      1. aaaa
                    4. 1 matchaaaaa
                      1. aaaa
                    Warning

                    There must be no space after the comma in {min,max}.

                    At least x times

                    If we wish to match a particular token at least x times, we can suffix it with {x,}. Think of it as {min,max}, but without an upper bound.

                    /a{2,}/g[RegExr] [Visual]
                    • 0 matches
                      1. 0 matchesa
                        1. 1 matchaa
                          1. aa
                        2. 1 matchaaa
                          1. aaa
                        3. 1 matchaaaa
                          1. aaaa
                        4. 1 matchaaaaa
                          1. aaaaa

                        A note on greediness

                        Regular expressions, by default, are greedy. They attempt to match as much as possible.

                        /a*/g[RegExr] [Visual]
                        • 2 matchesaaaaaa
                          1. aaaaaa
                        /".*"/g[RegExr] [Visual]
                        • 1 match"quote"
                          1. "quote"
                        • 1 match"quote", "quote"
                          1. "quote", "quote"
                        • 1 match"quote"quote"
                          1. "quote"quote"

                        Suffixing a repetition operator (?, *, +, …) with a ?, one can make it “lazy”.

                        /".*?"/g[RegExr] [Visual]
                        • 1 match"quote"
                          1. "quote"
                        • 2 matches"quote", "quote"
                          1. "quote"
                          2. "quote"
                        • 1 match"quote"quote"
                          1. "quote"

                        Here, this could also be achieved by using [^"] instead of . (as is best practice).

                        /"[^"]*"/g[RegExr] [Visual]
                        • 1 match"quote"
                          1. "quote"
                        • 2 matches"quote", "quote"
                          1. "quote"
                          2. "quote"
                        • 1 match"quote"quote"
                          1. "quote"

                        […] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more

                        Andrew S on StackOverflow

                        /<.+>/g[RegExr] [Visual]
                        • 1 match<em>g r e e d y</em>
                          1. <em>g r e e d y</em>
                        /<.+?>/g[RegExr] [Visual]
                        • 2 matches<em>lazy</em>
                          1. <em>
                          2. </em>

                        Examples

                        Bitcoin address

                        /([13][a-km-zA-HJ-NP-Z0-9]{26,33})/g[RegExr] [Visual]
                        • 1 match3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
                          1. 3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
                        • 1 match1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
                          1. 1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
                        • 1 match2016-03-09,18f1yugoAJuXcHAbsuRVLQC9TezJ
                          1. 18f1yugoAJuXcHAbsuRVLQC9TezJ

                        Youtube Video

                        /(?:https?:\/\/)?(?:www\.)?youtube\.com\/watch\?.*?v=([^&\s]+).*/gm[RegExr] [Visual]
                        • 1 matchyoutube.com/watch?feature=sth&v=dQw4w9WgXcQ
                          1. youtube.com/watch?feature=sth&v=dQw4w9WgXcQ
                        • 1 matchhttps://www.youtube.com/watch?v=dQw4w9WgXcQ
                          1. https://www.youtube.com/watch?v=dQw4w9WgXcQ
                        • 1 matchwww.youtube.com/watch?v=dQw4w9WgXcQ
                          1. www.youtube.com/watch?v=dQw4w9WgXcQ
                        • 1 matchyoutube.com/watch?v=dQw4w9WgXcQ
                          1. youtube.com/watch?v=dQw4w9WgXcQ
                        • 1 matchfakeyoutube.com/watch?v=dQw4w9WgXcQ
                          1. youtube.com/watch?v=dQw4w9WgXcQ

                        We can adjust this to not match the last broken link using anchors, which we shall encounter soon.