Building a Regex Tester: Real-time Highlighting and Performance Optimization
Building a Regex Tester: Real-time Highlighting and Performance Optimization#
I was working on form validation recently, dealing with various regular expressions. Testing each regex change required a page refresh - too tedious. So I built my own regex tester. Here’s how it works.
How Regex Engines Work#
JavaScript uses a backtracking regex engine. Think of it as “trial and error”:
const regex = /a+b/
const text = 'aaab'
// Engine execution:
// 1. a+ greedily consumes all 'aaa'
// 2. Tries to match b, fails
// 3. Backtracks: a+ releases one 'a', becomes 'aa'
// 4. Tries b again, fails
// 5. Continues backtracking...
// 6. Final match: a+ matches 'aaa', b matches 'b'
Understanding this is crucial because greedy matching can cause catastrophic backtracking:
// Dangerous regex: matching HTML tags
const evil = /<(\w+>.*<\/\1)>/
// When input doesn't match, the engine backtracks wildly
const input = '<div>content'
evil.test(input) // May hang for seconds or minutes
Safe alternatives use atomic groups or possessive quantifiers (not yet in JavaScript), or optimize the regex structure.
Core Implementation: Real-time Matching#
The heart of a regex tester is real-time matching and highlighting. Here’s the basic implementation:
function matchAll(pattern: string, text: string, flags: string) {
const matches: { text: string; index: number; groups?: Record<string, string> }[] = []
try {
const regex = new RegExp(pattern, flags)
if (flags.includes('g')) {
let match
while ((match = regex.exec(text)) !== null) {
matches.push({
text: match[0],
index: match.index,
groups: match.groups
})
// Prevent infinite loop from zero-width matches
if (match.index === regex.lastIndex) {
regex.lastIndex++
}
}
} else {
const match = regex.exec(text)
if (match) {
matches.push({
text: match[0],
index: match.index,
groups: match.groups
})
}
}
return { matches, error: null }
} catch (e) {
return { matches: [], error: e.message }
}
}
Key points:
- Zero-width match trap: Regexes like
/\b/gmatch zero-width boundaries.lastIndexdoesn’t advance, causing infinite loops. Fix: manually incrementlastIndex++ - Global flag g: Only with
gflag can you loop withexec. Otherwise, each call starts from the beginning - Named capture groups:
match.groupscontains named capture results (ES2018 feature)
Highlighting Matches#
After getting matches, highlight them in the original text. The straightforward approach uses split + reduce:
function highlightMatches(text: string, matches: Match[]) {
if (matches.length === 0) return [{ text, isMatch: false }]
// Sort by position to avoid overlaps
const sorted = [...matches].sort((a, b) => a.index - b.index)
const parts: { text: string; isMatch: boolean }[] = []
let lastEnd = 0
for (const match of sorted) {
// Normal text before match
if (match.index > lastEnd) {
parts.push({ text: text.slice(lastEnd, match.index), isMatch: false })
}
// Matched text
parts.push({ text: match.text, isMatch: true })
lastEnd = match.index + match.text.length
}
// Remaining normal text
if (lastEnd < text.length) {
parts.push({ text: text.slice(lastEnd), isMatch: false })
}
return parts
}
Render with conditional styling:
function HighlightedText({ text, matches }: Props) {
const parts = highlightMatches(text, matches)
return (
<div className="font-mono whitespace-pre-wrap">
{parts.map((part, i) => (
<span
key={i}
className={part.isMatch ? 'bg-yellow-500/30 text-yellow-300' : ''}
>
{part.text}
</span>
))}
</div>
)
}
Performance: Debouncing and Web Workers#
Regex testers need real-time response, but complex regexes can be slow. Two optimizations:
1. Debounce#
Don’t execute immediately on input. Wait 300ms after typing stops:
import { useMemo, useCallback } from 'react'
import { debounce } from 'lodash-es'
function useDebouncedMatch(pattern: string, text: string, flags: string) {
const [result, setResult] = useState<MatchResult>({ matches: [], error: null })
const debouncedMatch = useMemo(
() => debounce((p: string, t: string, f: string) => {
setResult(matchAll(p, t, f))
}, 300),
[]
)
useEffect(() => {
debouncedMatch(pattern, text, flags)
}, [pattern, text, flags])
return result
}
2. Web Worker Isolation#
For regexes that might hang, execute in a Web Worker to avoid UI blocking:
// worker.ts
self.onmessage = (e) => {
const { pattern, text, flags, timeout = 5000 } = e.data
// Timeout protection
const timer = setTimeout(() => {
self.postMessage({ error: 'Execution timeout: possible catastrophic backtracking' })
self.terminate()
}, timeout)
try {
const result = matchAll(pattern, text, flags)
clearTimeout(timer)
self.postMessage(result)
} catch (e) {
clearTimeout(timer)
self.postMessage({ error: e.message })
}
}
// main.tsx
const worker = new Worker('worker.ts')
function safeMatch(pattern: string, text: string, flags: string) {
return new Promise((resolve) => {
worker.onmessage = (e) => resolve(e.data)
worker.postMessage({ pattern, text, flags })
})
}
Regex Flag Pitfalls#
JavaScript supports 6 flags, each with caveats:
| Flag | Meaning | Gotchas |
|---|---|---|
g |
Global | lastIndex changes, repeated test() returns different results |
i |
Case insensitive | Unicode case conversion may not match expectations |
m |
Multiline | ^ and $ match line start/end, not string start/end |
s |
dotAll | . matches newlines, ES2018 addition |
u |
Unicode | Handles Unicode correctly; \u{1F600} requires u |
y |
Sticky | Matches from lastIndex; returns null if no match |
The trickiest is the g flag:
const regex = /\d+/g
// First match
regex.test('123abc') // true, lastIndex = 3
regex.test('123abc') // false, lastIndex = 0 (starts from 'abc')
regex.test('123abc') // true, lastIndex = 3
regex.test('123abc') // false
// Fix: reset lastIndex before each use
regex.lastIndex = 0
regex.test('123abc') // Always returns true
Common Regex Patterns#
Built-in patterns help users get started quickly:
const commonPatterns = [
{
name: 'Email',
pattern: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}',
description: 'Standard email format'
},
{
name: 'Phone (CN)',
pattern: '1[3-9]\\d{9}',
description: 'Chinese mobile phone number'
},
{
name: 'URL',
pattern: 'https?://[^\\s]+',
description: 'HTTP/HTTPS links'
},
{
name: 'IP Address',
pattern: '\\b(?:\\d{1,3}\\.){3}\\d{1,3}\\b',
description: 'IPv4 address (no range validation)'
},
{
name: 'Date',
pattern: '\\d{4}-\\d{2}-\\d{2}',
description: 'YYYY-MM-DD format'
}
]
Note these are loose matches. Production code needs stricter validation. The IP regex only checks format, not whether each number is 0-255.
Real-world Issues#
1. Regex Injection#
User-provided regex can inject malicious code:
// User input: .*.*.*.*.*.*.*.*.*.*.*$
// This causes catastrophic backtracking
const userInput = '.*.*.*.*.*.*.*.*.*.*.*$'
const text = 'aaaaaaaaaaaaaaaaaaaaaa!'
// Testing hangs
new RegExp(userInput).test(text)
Solution: limit regex complexity or use a safe regex library.
2. Unicode Character Handling#
JavaScript strings are UTF-16 encoded. Surrogate pairs count as two characters:
'😀'.length // 2, not 1
'😀'.match(/^.$/) // null, because . matches single code unit
// Add u flag for correct handling
'😀'.match(/^.$/u) // ['😀']
Always use the u flag when dealing with emoji or rare characters.
3. Capture Group Numbering#
Nested capture groups can be confusing:
const regex = /((\d+)-(\d+))/
const match = '2024-01'.match(regex)
// match[0] = '2024-01' entire match
// match[1] = '2024-01' first capture group
// match[2] = '2024' second capture group
// match[3] = '01' third capture group
Use named capture groups for clarity:
const regex = /(?<year>\d+)-(?<month>\d+)/
const match = '2024-01'.match(regex)
match.groups.year // '2024'
match.groups.month // '01'
The Result#
Based on these ideas, I built: Regex Tester
Features:
- Real-time match highlighting
- All 6 regex flags supported
- Built-in common patterns
- Friendly error messages
- Named capture group support
Regex is a deep topic, but understanding the fundamentals makes many problems solvable. Hope this helps.
Related: Regex Generator | Regex Cheatsheet