DEPRECATED — Modern Claude models produce accurate, well-explained regex patterns with edge-case test suites natively, including multi-language usage examples. The uplift delta from this skill approaches zero. Retained for archival reference only.
Regex Builder
Transforms matching requirements (positive and negative examples) into tested regex patterns with component-by-component explanations, capture group documentation, edge case identification, and ready-to-use code in Python and JavaScript.
Reference Files
| File | Contents | Load When |
|---|---|---|
references/character-classes.md | Character class reference, Unicode categories, POSIX classes | Always |
references/quantifiers.md | Quantifier behavior, greedy vs lazy vs possessive, backtracking | Pattern needs repetition |
references/common-patterns.md | Validated patterns for email, URL, phone, IP, date, UUID, etc. | Common validation requested |
references/flavor-differences.md | Syntax differences between Python, JavaScript, PCRE, POSIX | Multi-language usage needed |
Prerequisites
- Clear specification: what should match and what should not
- Target regex flavor (Python
re, JavaScript, PCRE) — defaults to Python
Workflow
Phase 1: Collect Examples
Gather positive (should match) and negative (should not match) examples:
- From user — Explicit examples provided
- From context — If the user says "match email addresses," infer standard positive and negative examples
- From data — If sample data is provided, identify the pattern within it
Minimum: 3 positive examples and 3 negative examples. Fewer examples risk overfitting the pattern to specific cases.
Phase 2: Infer Pattern
Analyze the examples to build a pattern:
- Identify fixed literals — Characters that appear in the same position across all positive examples
- Identify character classes — Positions where different characters appear but follow a pattern (digits, letters, alphanumeric)
- Identify repetition — Elements that appear a variable number of times
- Identify optional elements — Parts present in some positive examples but not others
- Identify anchoring — Must the pattern match the entire string or can it be a substring?
Phase 3: Explain Pattern
Break down the pattern into a component table:
| Component | Meaning |
|---|---|
^ | Start of string |
[A-Za-z] | One letter (upper or lower) |
\d{3,5} | 3 to 5 digits |
$ | End of string |
Document capture groups separately if the pattern uses them.
Phase 4: Generate Edge Cases
For every pattern, identify inputs that are likely to cause problems:
- Empty string — Does the pattern handle it correctly?
- Almost-matching strings — One character off from a valid match
- Boundary lengths — Minimum and maximum valid lengths
- Special characters — Dots, brackets, backslashes in the input
- Unicode — Multi-byte characters, emoji, diacritics
- Catastrophic backtracking — Inputs that cause exponential matching time
Phase 5: Output
Produce the pattern, explanation, test cases, and usage examples.
Output Format
## Regex Pattern: {Brief Description}
### Requirements
- **Must match:** {description of valid inputs}
- **Must reject:** {description of invalid inputs}
- **Flavor:** {Python re | JavaScript | PCRE}
### Pattern
```regex
{pattern}
Explanation
| Component | Meaning |
|---|---|
{component} | {what it matches and why} |
Capture Groups
| Group | Name | Captures | Example |
|---|---|---|---|
| 1 | {name} | {what} | {example value} |
Test Cases
| # | Input | Should Match | Reason |
|---|---|---|---|
| 1 | {input} | Yes | {why — happy path} |
| 2 | {input} | Yes | {why — boundary} |
| 3 | {input} | No | {why — invalid} |
| 4 | {input} | No | {why — near-miss} |
| 5 | `` (empty) | No | Empty input |
Edge Cases
- {Edge case 1}: {what to watch for}
- {Edge case 2}: {what to watch for}
Usage
Python:
import re
pattern = re.compile(r'{pattern}')
# Match entire string
if pattern.fullmatch(text):
...
# Search within string
match = pattern.search(text)
if match:
captured = match.group(1)
# Find all matches
matches = pattern.findall(text)
JavaScript:
const pattern = /{pattern}/;
// Test
if (pattern.test(text)) { ... }
// Match
const match = text.match(pattern);
if (match) {
const captured = match[1];
}
// Find all
const matches = [...text.matchAll(/{pattern}/g)];
## Calibration Rules
1. **Correctness over cleverness.** A readable, slightly longer pattern is better than
a cryptic short one. `[A-Za-z0-9]` is clearer than `\w` when you specifically mean
alphanumeric without underscores.
2. **Test negatives as rigorously as positives.** A pattern that matches everything
technically matches all positive examples. Negative examples prevent over-matching.
3. **Anchor when appropriate.** `^\d{3}$` matches exactly 3 digits. `\d{3}` matches
3 digits anywhere in the string. State the anchoring intent explicitly.
4. **Avoid catastrophic backtracking.** Nested quantifiers like `(a+)+` cause exponential
time on non-matching input. Test with adversarial inputs.
5. **Named groups over numbered groups.** `(?P<year>\d{4})` (Python) or `(?<year>\d{4})`
(JS) is self-documenting. Use numbered groups only for simple patterns.
6. **Specify the flavor.** Python `re`, JavaScript, and PCRE have different feature sets.
Lookaheads, lookbehinds, and Unicode support vary.
## Error Handling
| Problem | Resolution |
|---------|------------|
| Insufficient examples | Ask for more. Minimum 3 positive, 3 negative. |
| Contradictory examples | Flag the contradiction. Ask which examples are correct. |
| Requirements too complex for regex | Suggest a parser instead. Regex cannot handle recursive structures (nested brackets, HTML). |
| Pattern causes backtracking | Rewrite with atomic groups or possessive quantifiers. Test with worst-case input. |
| Unicode requirements unclear | Ask if the pattern needs to handle non-ASCII. Default to ASCII unless specified. |
| Multiple valid patterns | Present the simplest one. Mention alternatives if they have meaningful tradeoffs (performance vs readability). |
## When NOT to Build Regex
Push back if:
- The input requires parsing a recursive grammar (HTML, JSON, nested expressions) — use a parser
- The validation is for a standard format with a library (email validation, URL parsing) — use the standard library
- The pattern is for security-critical input validation as the sole defense — regex is a first filter, not a security boundary
- The user wants to modify matched content in complex ways — regex replacement has limits; suggest code instead