From Regex to AST: Building a CSS Formatter#

I recently inherited a legacy project where CSS files were minified to a single line. Debugging was a nightmare. I tried several online tools, but they either lacked features or struggled with nested rules. So I built my own. Here’s how it works.

The Core: Parse, Rebuild, Output#

CSS doesn’t have JSON.parse. The simplest approach uses regex:

function beautifyCss(css, spaces = 2) {
  let formatted = ''
  let indentLevel = 0

  const lines = css
    .replace(/\s+/g, ' ')           // Merge whitespace
    .replace(/\s*{\s*/g, ' {\n')    // Newline after {
    .replace(/;\s*/g, ';\n')        // Newline after ;
    .replace(/}\s*/g, '\n}\n')      // Newline around }
    .split('\n')

  for (let line of lines) {
    line = line.trim()
    if (!line) continue

    if (line.includes('}')) indentLevel = Math.max(0, indentLevel - 1)
    formatted += ' '.repeat(indentLevel * spaces) + line + '\n'
    if (line.includes('{')) indentLevel++
  }

  return formatted.trim()
}

This handles most cases, but fails on @media queries and nested rules.

Where Regex Falls Short#

1. Nested Rules Break#

/* Input */
.container { .title { color: red; } }

/* Regex output (looks fine) */
.container {
  .title {
    color: red;
  }
}

But add a comment:

/* Input */
.container { /* Title */ .title { color: red; } }

/* Regex output (wrong) */
.container {
  /* Title */ .title {
    color: red;
  }
}

The comment and selector get mixed together.

2. @media Queries Get Wrong Indent#

@media (max-width: 768px) {
.container { padding: 10px; }
}

Regex can’t detect @media blocks, so all rules start at indent 0.

3. Special Characters in Strings#

.content::after {
  content: "{ this is not a selector }";
}

Regex treats { in the string as a selector start, breaking the output.

AST Approach: The Proper Way#

Professional CSS formatters use AST (Abstract Syntax Tree):

import { parse } from 'css-tree'

function formatCss(css) {
  const ast = parse(css)

  walk(ast, (node) => {
    if (node.type === 'Atrule') {
      // Handle @media, @keyframes, etc.
    } else if (node.type === 'Rule') {
      // Handle regular rules
    } else if (node.type === 'Declaration') {
      // Handle property declarations
    }
  })

  return generate(ast, { indent: '  ' })
}

AST correctly handles all edge cases:

  • Special characters in strings aren’t misparsed
  • @media queries have proper indent levels
  • Comments stay in correct positions
  • Nested rules are recognized (if CSS Nesting is supported)

But css-tree is large (150KB+), a burden for browser-based tools.

Hybrid Approach: Regex + State Machine#

Balance size and correctness with a state machine:

function smartFormat(css: string, indent = 2) {
  let output = ''
  let level = 0
  let inString = false
  let inComment = false
  let buffer = ''

  for (let i = 0; i < css.length; i++) {
    const char = css[i]
    const nextChar = css[i + 1]

    // Handle comments
    if (!inString && char === '/' && nextChar === '*') {
      inComment = true
      buffer += '/*'
      i++
      continue
    }
    if (inComment && char === '*' && nextChar === '/') {
      inComment = false
      buffer += '*/'
      i++
      continue
    }

    // Handle strings
    if (!inComment && (char === '"' || char === "'")) {
      inString = !inString
      buffer += char
      continue
    }

    // Skip special chars in comments and strings
    if (inString || inComment) {
      buffer += char
      continue
    }

    // Handle { }
    if (char === '{') {
      output += ' '.repeat(level * indent) + buffer.trim() + ' {\n'
      level++
      buffer = ''
    } else if (char === '}') {
      level--
      if (buffer.trim()) {
        output += ' '.repeat(level * indent) + buffer.trim() + '\n'
      }
      output += ' '.repeat(level * indent) + '}\n'
      buffer = ''
    } else if (char === ';') {
      output += ' '.repeat(level * indent) + buffer.trim() + ';\n'
      buffer = ''
    } else {
      buffer += char
    }
  }

  return output.trim()
}

This is more robust than pure regex, yet keeps code size small.

CSS Minification Tricks#

Minification is simpler—remove unnecessary characters:

function minifyCss(css) {
  return css
    .replace(/\/\*[\s\S]*?\*\//g, '')   // Remove comments
    .replace(/\s+/g, ' ')                // Merge whitespace
    .replace(/\s*([{}:;,])\s*/g, '$1')  // Remove space around symbols
    .replace(/;}/g, '}')                 // Remove last semicolon
    .trim()
}

Key details:

  1. Don’t remove all whitespace: margin:0 auto can’t become margin:0auto
  2. Keep necessary semicolons: The last one can be omitted, but not middle ones
  3. Optimize color values: #ffffff#fff, rgb(0,0,0)#000

Advanced minification can do:

/* Before */
.container {
  margin-top: 10px;
  margin-right: 10px;
  margin-bottom: 10px;
  margin-left: 10px;
}

/* After */
.container{margin:10px}

This requires understanding CSS property semantics—regex can’t do it, AST is needed.

Performance: Handling Large Files#

When CSS exceeds 1MB (like Tailwind output), direct processing causes lag.

Option 1: Web Worker#

Move parsing to a Web Worker:

// worker.ts
self.onmessage = (e) => {
  const result = formatCss(e.data)
  self.postMessage(result)
}

// main.tsx
const worker = new Worker('worker.ts')
worker.postMessage(largeCss)
worker.onmessage = (e) => setOutput(e.data)

Option 2: Streaming#

CSS is naturally suited for streaming—process rule by rule:

async function streamFormat(css: string) {
  const rules = splitRules(css)  // Split by }
  const output = []

  for (const rule of rules) {
    output.push(formatRule(rule))
    await sleep(0)  // Yield to main thread
  }

  return output.join('\n')
}

Option 3: Lazy-load AST Library#

Use lightweight regex first, load AST library only for advanced features:

const formatCss = mode === 'simple'
  ? regexFormat
  : await import('css-tree').then(m => astFormat)

The Result#

Based on these ideas, I built: CSS Formatter

Features:

  • Beautify: Format CSS code with 2/4/8 space indents
  • Minify: Remove comments and whitespace to reduce file size
  • Real-time stats: Show character count and compression ratio

The implementation uses a hybrid approach—state machine + regex. It handles most edge cases while keeping code size small (core logic under 100 lines).

User experience details:

  1. Example code: Click “Example” to load demo CSS
  2. One-click copy: Copy formatted output to clipboard
  3. Real-time feedback: Show compression ratio so users see optimization results

Edge Cases I Hit#

1. IE Hack Syntax#

.container {
  _width: 100px;  /* IE6 hack */
  *width: 200px;  /* IE7 hack */
}

Underscore and asterisk prefixes shouldn’t be treated as errors.

2. CSS Variables#

:root {
  --primary-color: #007bff;
}

CSS variable syntax must be parsed correctly—don’t strip -- prefix.

3. calc() Function#

.width {
  width: calc(100% - 20px);
}

Spaces around operators inside calc() are required—don’t remove them.

The implementation isn’t complex, but getting details right takes effort. Hope this helps.


Related: CSS Gradient Generator | CSS Shadow Generator | Tailwind CSS Class Generator