Regular Expressions Cheat Sheet with Online Tester

Regular expressions (regex) are one of the most powerful tools in a developer's toolkit. They let you search, match, extract, and replace text patterns in strings with concise, expressive syntax. They are also one of the most feared topics for newcomers, because the syntax can look cryptic at first glance. This cheat sheet breaks regex down into practical, digestible sections and pairs each concept with real-world examples you can test immediately.

Basic Character Matching

At its simplest, a regex is just a sequence of literal characters. The pattern "hello" matches the exact string "hello" in the input.

Special characters (called metacharacters) add flexibility:

- . (dot): Matches any single character except a newline. - \d: Matches any digit (0-9). - \D: Matches any non-digit character. - \w: Matches any word character (letter, digit, or underscore). - \W: Matches any non-word character. - \s: Matches any whitespace character (space, tab, newline). - \S: Matches any non-whitespace character.

For example, \d\d\d matches any three consecutive digits, like "123" or "007".

Quantifiers: How Many Times

Quantifiers specify how many times the preceding element must occur:

- * (asterisk): Zero or more times. "ab*c" matches "ac", "abc", "abbc", "abbbc", etc. - + (plus): One or more times. "ab+c" matches "abc", "abbc", but not "ac". - ? (question mark): Zero or one time. "colou?r" matches both "color" and "colour". - {n}: Exactly n times. "\d{4}" matches exactly four digits. - {n,}: At least n times. "\d{2,}" matches two or more digits. - {n,m}: Between n and m times (inclusive). "\d{2,4}" matches two, three, or four digits.

By default, quantifiers are greedy: they match as many characters as possible. Adding a "?" after the quantifier makes it lazy (non-greedy), matching as few characters as possible. This distinction matters when parsing HTML or extracting quoted strings.

Anchors and Boundaries

Anchors do not match characters; they match positions in the string:

- ^ (caret): Matches the start of the string (or the start of a line in multiline mode). - $ (dollar): Matches the end of the string (or end of a line in multiline mode). - \b: Matches a word boundary (the position between a word character and a non-word character). - \B: Matches a non-word boundary.

These are essential for precise matching. For instance, "\bcat\b" matches the word "cat" but not "catalog" or "concatenate".

Character Classes and Alternation

Character classes let you define a set of characters to match:

- [abc]: Matches "a", "b", or "c". - [a-z]: Matches any lowercase letter. - [A-Za-z0-9]: Matches any alphanumeric character. - [^abc]: Matches any character except "a", "b", or "c" (negated class).

Alternation uses the pipe character "|" to match one pattern or another: "cat|dog" matches either "cat" or "dog". You can group alternatives with parentheses: "(red|blue) car" matches "red car" or "blue car".

Groups and Backreferences

Parentheses serve two purposes: grouping and capturing.

- (abc): Captures the match for later use (backreference \1, \2, etc.). - (?:abc): Groups without capturing (non-capturing group). Use this when you need grouping but do not need the captured value. - (?<name>abc): Named capturing group. Accessed by name instead of number.

Backreferences are powerful for finding duplicates. The pattern "(\b\w+)\s+\1" matches repeated words like "the the" or "is is".

In replacement strings, captured groups are referenced with $1, $2, etc. (or $<name> for named groups).

Lookaheads and Lookbehinds

Lookarounds assert that a pattern exists (or does not exist) at a certain position without consuming characters:

- (?=abc): Positive lookahead. Matches if "abc" follows the current position. - (?!abc): Negative lookahead. Matches if "abc" does not follow. - (?<=abc): Positive lookbehind. Matches if "abc" precedes the current position. - (?<!abc): Negative lookbehind. Matches if "abc" does not precede.

Example: "\d+(?= dollars)" matches one or more digits only when followed by " dollars". The word "dollars" is not included in the match.

Common Practical Patterns

Here are regex patterns developers use regularly:

- Email (simplified): [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} - URL: https?://[^\s/$.?#].[^\s]* - IPv4 address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b - Date (YYYY-MM-DD): \d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01]) - Hex color: #(?:[0-9a-fA-F]{3}){1,2}\b - Phone number (US): (?:\+1)?[-.\s]?$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}

These are starting points. Production-grade validation often requires additional checks beyond what regex alone can provide.

Flags That Change Behavior

Regex flags modify how the pattern is applied:

- g (global): Find all matches, not just the first. - i (case-insensitive): "abc" matches "ABC", "Abc", etc. - m (multiline): ^ and $ match the start and end of each line, not just the whole string. - s (dotAll): The dot "." matches newline characters as well. - u (unicode): Enables full Unicode matching.

Most online regex testers let you toggle these flags with checkboxes.

Test Your Regex Online

The best way to learn regex is by experimenting. Open the ToolboxHub Regex Tester at /tools/regex-tester, type a pattern, paste some test text, and see matches highlighted in real time. The tool shows captured groups, match indices, and supports all standard flags.

You might also find the JSON Formatter useful when your regex is extracting data from JSON strings, or the Word Counter helpful for verifying text processing results.