From Regex to AST: Building a CSS Formatter
From Regex to AST: Building a CSS Formatter#
I recently inherited a legacy project where CSS files were minified to a single line. Debugging was a nightmare. I tried several online tools, but they either lacked features or struggled with nested rules. So I built my own. Here’s how it works.
The Core: Parse, Rebuild, Output#
CSS doesn’t have JSON.parse. The simplest approach uses regex:
function beautifyCss(css, spaces = 2) {
let formatted = ''
let indentLevel = 0
const lines = css
.replace(/\s+/g, ' ') // Merge whitespace
.replace(/\s*{\s*/g, ' {\n') // Newline after {
.replace(/;\s*/g, ';\n') // Newline after ;
.replace(/}\s*/g, '\n}\n') // Newline around }
.split('\n')
for (let line of lines) {
line = line.trim()
if (!line) continue
if (line.includes('}')) indentLevel = Math.max(0, indentLevel - 1)
formatted += ' '.repeat(indentLevel * spaces) + line + '\n'
if (line.includes('{')) indentLevel++
}
return formatted.trim()
}
This handles most cases, but fails on @media queries and nested rules.
Where Regex Falls Short#
1. Nested Rules Break#
/* Input */
.container { .title { color: red; } }
/* Regex output (looks fine) */
.container {
.title {
color: red;
}
}
But add a comment:
/* Input */
.container { /* Title */ .title { color: red; } }
/* Regex output (wrong) */
.container {
/* Title */ .title {
color: red;
}
}
The comment and selector get mixed together.
2. @media Queries Get Wrong Indent#
@media (max-width: 768px) {
.container { padding: 10px; }
}
Regex can’t detect @media blocks, so all rules start at indent 0.
3. Special Characters in Strings#
.content::after {
content: "{ this is not a selector }";
}
Regex treats { in the string as a selector start, breaking the output.
AST Approach: The Proper Way#
Professional CSS formatters use AST (Abstract Syntax Tree):
import { parse } from 'css-tree'
function formatCss(css) {
const ast = parse(css)
walk(ast, (node) => {
if (node.type === 'Atrule') {
// Handle @media, @keyframes, etc.
} else if (node.type === 'Rule') {
// Handle regular rules
} else if (node.type === 'Declaration') {
// Handle property declarations
}
})
return generate(ast, { indent: ' ' })
}
AST correctly handles all edge cases:
- Special characters in strings aren’t misparsed
@mediaqueries have proper indent levels- Comments stay in correct positions
- Nested rules are recognized (if CSS Nesting is supported)
But css-tree is large (150KB+), a burden for browser-based tools.
Hybrid Approach: Regex + State Machine#
Balance size and correctness with a state machine:
function smartFormat(css: string, indent = 2) {
let output = ''
let level = 0
let inString = false
let inComment = false
let buffer = ''
for (let i = 0; i < css.length; i++) {
const char = css[i]
const nextChar = css[i + 1]
// Handle comments
if (!inString && char === '/' && nextChar === '*') {
inComment = true
buffer += '/*'
i++
continue
}
if (inComment && char === '*' && nextChar === '/') {
inComment = false
buffer += '*/'
i++
continue
}
// Handle strings
if (!inComment && (char === '"' || char === "'")) {
inString = !inString
buffer += char
continue
}
// Skip special chars in comments and strings
if (inString || inComment) {
buffer += char
continue
}
// Handle { }
if (char === '{') {
output += ' '.repeat(level * indent) + buffer.trim() + ' {\n'
level++
buffer = ''
} else if (char === '}') {
level--
if (buffer.trim()) {
output += ' '.repeat(level * indent) + buffer.trim() + '\n'
}
output += ' '.repeat(level * indent) + '}\n'
buffer = ''
} else if (char === ';') {
output += ' '.repeat(level * indent) + buffer.trim() + ';\n'
buffer = ''
} else {
buffer += char
}
}
return output.trim()
}
This is more robust than pure regex, yet keeps code size small.
CSS Minification Tricks#
Minification is simpler—remove unnecessary characters:
function minifyCss(css) {
return css
.replace(/\/\*[\s\S]*?\*\//g, '') // Remove comments
.replace(/\s+/g, ' ') // Merge whitespace
.replace(/\s*([{}:;,])\s*/g, '$1') // Remove space around symbols
.replace(/;}/g, '}') // Remove last semicolon
.trim()
}
Key details:
- Don’t remove all whitespace:
margin:0 autocan’t becomemargin:0auto - Keep necessary semicolons: The last one can be omitted, but not middle ones
- Optimize color values:
#ffffff→#fff,rgb(0,0,0)→#000
Advanced minification can do:
/* Before */
.container {
margin-top: 10px;
margin-right: 10px;
margin-bottom: 10px;
margin-left: 10px;
}
/* After */
.container{margin:10px}
This requires understanding CSS property semantics—regex can’t do it, AST is needed.
Performance: Handling Large Files#
When CSS exceeds 1MB (like Tailwind output), direct processing causes lag.
Option 1: Web Worker#
Move parsing to a Web Worker:
// worker.ts
self.onmessage = (e) => {
const result = formatCss(e.data)
self.postMessage(result)
}
// main.tsx
const worker = new Worker('worker.ts')
worker.postMessage(largeCss)
worker.onmessage = (e) => setOutput(e.data)
Option 2: Streaming#
CSS is naturally suited for streaming—process rule by rule:
async function streamFormat(css: string) {
const rules = splitRules(css) // Split by }
const output = []
for (const rule of rules) {
output.push(formatRule(rule))
await sleep(0) // Yield to main thread
}
return output.join('\n')
}
Option 3: Lazy-load AST Library#
Use lightweight regex first, load AST library only for advanced features:
const formatCss = mode === 'simple'
? regexFormat
: await import('css-tree').then(m => astFormat)
The Result#
Based on these ideas, I built: CSS Formatter
Features:
- Beautify: Format CSS code with 2/4/8 space indents
- Minify: Remove comments and whitespace to reduce file size
- Real-time stats: Show character count and compression ratio
The implementation uses a hybrid approach—state machine + regex. It handles most edge cases while keeping code size small (core logic under 100 lines).
User experience details:
- Example code: Click “Example” to load demo CSS
- One-click copy: Copy formatted output to clipboard
- Real-time feedback: Show compression ratio so users see optimization results
Edge Cases I Hit#
1. IE Hack Syntax#
.container {
_width: 100px; /* IE6 hack */
*width: 200px; /* IE7 hack */
}
Underscore and asterisk prefixes shouldn’t be treated as errors.
2. CSS Variables#
:root {
--primary-color: #007bff;
}
CSS variable syntax must be parsed correctly—don’t strip -- prefix.
3. calc() Function#
.width {
width: calc(100% - 20px);
}
Spaces around operators inside calc() are required—don’t remove them.
The implementation isn’t complex, but getting details right takes effort. Hope this helps.
Related: CSS Gradient Generator | CSS Shadow Generator | Tailwind CSS Class Generator