Regular Expressions: A Practical Guide to Pattern Matching

Regular expressions (regex) are a mini-language for describing text patterns. They appear in every programming language, code editor, command-line tool, and database engine. Learning them pays off immediately — the same pattern syntax works across Python, JavaScript, grep, sed, VS Code's search, and SQL.

The Building Blocks

A regex pattern is built from literals and metacharacters. A literal character like a matches the letter a exactly. Metacharacters have special meanings:

. — matches any single character (except newline by default)
^ — matches the start of the string (or line in multiline mode)
$ — matches the end of the string or line
* — the preceding element zero or more times
+ — the preceding element one or more times
? — the preceding element zero or one time (also makes quantifiers lazy)
{n,m} — the preceding element between n and m times
| — alternation (OR): matches either the left or right side
\ — escape a metacharacter or use a special sequence

Character Classes

[abc] matches any single character that's a, b, or c. [a-z] matches any lowercase letter. [^abc] matches any character that's NOT a, b, or c.

Shorthand character classes save typing:

\d — any digit (same as [0-9])
\w — any word character: letters, digits, underscore ([a-zA-Z0-9_])
\s — any whitespace character (space, tab, newline)
\D, \W, \S — uppercase versions are the inverse

Groups and Captures

Parentheses () group expressions and capture the matched text. (\d{4})-(\d{2})-(\d{2}) matches a date like 2024-04-25 and captures year, month, and day as separate groups you can reference in replacement strings or extract programmatically.

Non-capturing groups (?:...) group without capturing — useful when you need grouping for alternation but don't need the matched text.

Practical Examples

# Email (simplified — true email validation is more complex)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

# IPv4 address
^(\d{1,3}\.){3}\d{1,3}$

# UK postcode
^[A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}$

# Hex color code (with or without #)
^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

# ISO 8601 date
^\d{4}-\d{2}-\d{2}$

Greedy vs Lazy Matching

Quantifiers are greedy by default — they match as much as possible. <.+> applied to <b>text</b> matches the entire string, not just <b>. Add ? to make it lazy: <.+?> matches <b> and stops. Most HTML/XML processing bugs come from greedy quantifiers eating too much.

Lookaheads and Lookbehinds

Lookaheads let you match based on what comes after without including it in the match. \d+(?= USD) matches numbers followed by " USD" but only captures the number. Lookbehinds work in the other direction. These are powerful for extracting data from structured text.

Test and debug your patterns with our Regex Tester — real-time matching with highlighted captures and a match list as you type.

Regular Expressions: A Practical Guide to Pattern Matching

The Building Blocks

Character Classes

Groups and Captures

Practical Examples

Greedy vs Lazy Matching

Lookaheads and Lookbehinds

Related Articles

YAML vs JSON: When to Use Each and How to Format Both

URL Encoding Explained: %20, %3A, and Why They Exist

CSS Minification: What Gets Removed and Why It Matters