Developer Tools

Regular Expressions: A Practical Guide to Pattern Matching

Regex appears in every language and every editor. This practical guide covers the syntax you'll use daily, from basic character classes to lookaheads.

</>

DevPulse Team

Regular expressions (regex) are a mini-language for describing text patterns. They appear in every programming language, code editor, command-line tool, and database engine. Learning them pays off immediately — the same pattern syntax works across Python, JavaScript, grep, sed, VS Code's search, and SQL.

The Building Blocks

A regex pattern is built from literals and metacharacters. A literal character like a matches the letter a exactly. Metacharacters have special meanings:

  • . — matches any single character (except newline by default)
  • ^ — matches the start of the string (or line in multiline mode)
  • $ — matches the end of the string or line
  • * — the preceding element zero or more times
  • + — the preceding element one or more times
  • ? — the preceding element zero or one time (also makes quantifiers lazy)
  • {n,m} — the preceding element between n and m times
  • | — alternation (OR): matches either the left or right side
  • \ — escape a metacharacter or use a special sequence

Character Classes

[abc] matches any single character that's a, b, or c. [a-z] matches any lowercase letter. [^abc] matches any character that's NOT a, b, or c.

Shorthand character classes save typing:

  • \d — any digit (same as [0-9])
  • \w — any word character: letters, digits, underscore ([a-zA-Z0-9_])
  • \s — any whitespace character (space, tab, newline)
  • \D, \W, \S — uppercase versions are the inverse

Groups and Captures

Parentheses () group expressions and capture the matched text. (\d{4})-(\d{2})-(\d{2}) matches a date like 2024-04-25 and captures year, month, and day as separate groups you can reference in replacement strings or extract programmatically.

Non-capturing groups (?:...) group without capturing — useful when you need grouping for alternation but don't need the matched text.

Practical Examples

# Email (simplified — true email validation is more complex)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

# IPv4 address
^(\d{1,3}\.){3}\d{1,3}$

# UK postcode
^[A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}$

# Hex color code (with or without #)
^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

# ISO 8601 date
^\d{4}-\d{2}-\d{2}$

Greedy vs Lazy Matching

Quantifiers are greedy by default — they match as much as possible. <.+> applied to <b>text</b> matches the entire string, not just <b>. Add ? to make it lazy: <.+?> matches <b> and stops. Most HTML/XML processing bugs come from greedy quantifiers eating too much.

Lookaheads and Lookbehinds

Lookaheads let you match based on what comes after without including it in the match. \d+(?= USD) matches numbers followed by " USD" but only captures the number. Lookbehinds work in the other direction. These are powerful for extracting data from structured text.

Test and debug your patterns with our Regex Tester — real-time matching with highlighted captures and a match list as you type.

Free developer tools, right in your browser.

No sign-up. No tracking. 30+ utilities for developers.

Explore DevPulse Tools →