Building a Regex Tester: Real-time Highlighting and Performance Optimization#

I was working on form validation recently, dealing with various regular expressions. Testing each regex change required a page refresh - too tedious. So I built my own regex tester. Here’s how it works.

How Regex Engines Work#

JavaScript uses a backtracking regex engine. Think of it as “trial and error”:

const regex = /a+b/
const text = 'aaab'
// Engine execution:
// 1. a+ greedily consumes all 'aaa'
// 2. Tries to match b, fails
// 3. Backtracks: a+ releases one 'a', becomes 'aa'
// 4. Tries b again, fails
// 5. Continues backtracking...
// 6. Final match: a+ matches 'aaa', b matches 'b'

Understanding this is crucial because greedy matching can cause catastrophic backtracking:

// Dangerous regex: matching HTML tags
const evil = /<(\w+>.*<\/\1)>/

// When input doesn't match, the engine backtracks wildly
const input = '<div>content'
evil.test(input)  // May hang for seconds or minutes

Safe alternatives use atomic groups or possessive quantifiers (not yet in JavaScript), or optimize the regex structure.

Core Implementation: Real-time Matching#

The heart of a regex tester is real-time matching and highlighting. Here’s the basic implementation:

function matchAll(pattern: string, text: string, flags: string) {
  const matches: { text: string; index: number; groups?: Record<string, string> }[] = []
  
  try {
    const regex = new RegExp(pattern, flags)
    
    if (flags.includes('g')) {
      let match
      while ((match = regex.exec(text)) !== null) {
        matches.push({
          text: match[0],
          index: match.index,
          groups: match.groups
        })
        // Prevent infinite loop from zero-width matches
        if (match.index === regex.lastIndex) {
          regex.lastIndex++
        }
      }
    } else {
      const match = regex.exec(text)
      if (match) {
        matches.push({
          text: match[0],
          index: match.index,
          groups: match.groups
        })
      }
    }
    
    return { matches, error: null }
  } catch (e) {
    return { matches: [], error: e.message }
  }
}

Key points:

  1. Zero-width match trap: Regexes like /\b/g match zero-width boundaries. lastIndex doesn’t advance, causing infinite loops. Fix: manually increment lastIndex++
  2. Global flag g: Only with g flag can you loop with exec. Otherwise, each call starts from the beginning
  3. Named capture groups: match.groups contains named capture results (ES2018 feature)

Highlighting Matches#

After getting matches, highlight them in the original text. The straightforward approach uses split + reduce:

function highlightMatches(text: string, matches: Match[]) {
  if (matches.length === 0) return [{ text, isMatch: false }]
  
  // Sort by position to avoid overlaps
  const sorted = [...matches].sort((a, b) => a.index - b.index)
  
  const parts: { text: string; isMatch: boolean }[] = []
  let lastEnd = 0
  
  for (const match of sorted) {
    // Normal text before match
    if (match.index > lastEnd) {
      parts.push({ text: text.slice(lastEnd, match.index), isMatch: false })
    }
    // Matched text
    parts.push({ text: match.text, isMatch: true })
    lastEnd = match.index + match.text.length
  }
  
  // Remaining normal text
  if (lastEnd < text.length) {
    parts.push({ text: text.slice(lastEnd), isMatch: false })
  }
  
  return parts
}

Render with conditional styling:

function HighlightedText({ text, matches }: Props) {
  const parts = highlightMatches(text, matches)
  
  return (
    <div className="font-mono whitespace-pre-wrap">
      {parts.map((part, i) => (
        <span
          key={i}
          className={part.isMatch ? 'bg-yellow-500/30 text-yellow-300' : ''}
        >
          {part.text}
        </span>
      ))}
    </div>
  )
}

Performance: Debouncing and Web Workers#

Regex testers need real-time response, but complex regexes can be slow. Two optimizations:

1. Debounce#

Don’t execute immediately on input. Wait 300ms after typing stops:

import { useMemo, useCallback } from 'react'
import { debounce } from 'lodash-es'

function useDebouncedMatch(pattern: string, text: string, flags: string) {
  const [result, setResult] = useState<MatchResult>({ matches: [], error: null })
  
  const debouncedMatch = useMemo(
    () => debounce((p: string, t: string, f: string) => {
      setResult(matchAll(p, t, f))
    }, 300),
    []
  )
  
  useEffect(() => {
    debouncedMatch(pattern, text, flags)
  }, [pattern, text, flags])
  
  return result
}

2. Web Worker Isolation#

For regexes that might hang, execute in a Web Worker to avoid UI blocking:

// worker.ts
self.onmessage = (e) => {
  const { pattern, text, flags, timeout = 5000 } = e.data
  
  // Timeout protection
  const timer = setTimeout(() => {
    self.postMessage({ error: 'Execution timeout: possible catastrophic backtracking' })
    self.terminate()
  }, timeout)
  
  try {
    const result = matchAll(pattern, text, flags)
    clearTimeout(timer)
    self.postMessage(result)
  } catch (e) {
    clearTimeout(timer)
    self.postMessage({ error: e.message })
  }
}

// main.tsx
const worker = new Worker('worker.ts')

function safeMatch(pattern: string, text: string, flags: string) {
  return new Promise((resolve) => {
    worker.onmessage = (e) => resolve(e.data)
    worker.postMessage({ pattern, text, flags })
  })
}

Regex Flag Pitfalls#

JavaScript supports 6 flags, each with caveats:

Flag Meaning Gotchas
g Global lastIndex changes, repeated test() returns different results
i Case insensitive Unicode case conversion may not match expectations
m Multiline ^ and $ match line start/end, not string start/end
s dotAll . matches newlines, ES2018 addition
u Unicode Handles Unicode correctly; \u{1F600} requires u
y Sticky Matches from lastIndex; returns null if no match

The trickiest is the g flag:

const regex = /\d+/g

// First match
regex.test('123abc')  // true, lastIndex = 3
regex.test('123abc')  // false, lastIndex = 0 (starts from 'abc')
regex.test('123abc')  // true, lastIndex = 3
regex.test('123abc')  // false

// Fix: reset lastIndex before each use
regex.lastIndex = 0
regex.test('123abc')  // Always returns true

Common Regex Patterns#

Built-in patterns help users get started quickly:

const commonPatterns = [
  {
    name: 'Email',
    pattern: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}',
    description: 'Standard email format'
  },
  {
    name: 'Phone (CN)',
    pattern: '1[3-9]\\d{9}',
    description: 'Chinese mobile phone number'
  },
  {
    name: 'URL',
    pattern: 'https?://[^\\s]+',
    description: 'HTTP/HTTPS links'
  },
  {
    name: 'IP Address',
    pattern: '\\b(?:\\d{1,3}\\.){3}\\d{1,3}\\b',
    description: 'IPv4 address (no range validation)'
  },
  {
    name: 'Date',
    pattern: '\\d{4}-\\d{2}-\\d{2}',
    description: 'YYYY-MM-DD format'
  }
]

Note these are loose matches. Production code needs stricter validation. The IP regex only checks format, not whether each number is 0-255.

Real-world Issues#

1. Regex Injection#

User-provided regex can inject malicious code:

// User input: .*.*.*.*.*.*.*.*.*.*.*$
// This causes catastrophic backtracking
const userInput = '.*.*.*.*.*.*.*.*.*.*.*$'
const text = 'aaaaaaaaaaaaaaaaaaaaaa!'

// Testing hangs
new RegExp(userInput).test(text)

Solution: limit regex complexity or use a safe regex library.

2. Unicode Character Handling#

JavaScript strings are UTF-16 encoded. Surrogate pairs count as two characters:

'😀'.length  // 2, not 1
'😀'.match(/^.$/)  // null, because . matches single code unit

// Add u flag for correct handling
'😀'.match(/^.$/u)  // ['😀']

Always use the u flag when dealing with emoji or rare characters.

3. Capture Group Numbering#

Nested capture groups can be confusing:

const regex = /((\d+)-(\d+))/
const match = '2024-01'.match(regex)
// match[0] = '2024-01'  entire match
// match[1] = '2024-01'  first capture group
// match[2] = '2024'     second capture group
// match[3] = '01'       third capture group

Use named capture groups for clarity:

const regex = /(?<year>\d+)-(?<month>\d+)/
const match = '2024-01'.match(regex)
match.groups.year   // '2024'
match.groups.month  // '01'

The Result#

Based on these ideas, I built: Regex Tester

Features:

  • Real-time match highlighting
  • All 6 regex flags supported
  • Built-in common patterns
  • Friendly error messages
  • Named capture group support

Regex is a deep topic, but understanding the fundamentals makes many problems solvable. Hope this helps.


Related: Regex Generator | Regex Cheatsheet