From Regex to AST: Understanding Code Minification Principles
From Regex to AST: Understanding Code Minification Principles#
Recently, I was optimizing a frontend project where the bundled JS file exceeded 2MB. Even with Webpack’s production mode, the size seemed excessive. Diving into code minification principles revealed more complexity than I expected.
What Does Code Minification Actually Do?#
Many think minification just “removes spaces and newlines,” but that’s only the tip of the iceberg. Complete code minification includes:
- Whitespace compression: Remove unnecessary spaces, newlines, indentation
- Comment removal: Strip all comments (optional: keep license headers)
- Variable shortening: Transform
userNameintoa,b,c - Dead code elimination: Remove unreachable code
- Statement merging: Combine multiple declarations into one line
- Constant folding:
var a = 1 + 2becomesvar a = 3
The first two can be done with regex; the rest requires AST (Abstract Syntax Tree) parsing.
Regex Implementation: Simple but Effective#
For quick minification needs, a regex approach works well. The core idea is matching and removing:
const minifiers = {
html: (code: string) => {
return code
.replace(/<!--[\s\S]*?-->/g, '') // Remove HTML comments
.replace(/>\s+</g, '><') // Whitespace between tags
.replace(/\s+/g, ' ') // Multiple spaces to one
.replace(/\s*([<>{}();,:])\s*/g, '$1') // Whitespace around symbols
.trim()
},
css: (code: string) => {
return code
.replace(/\/\*[\s\S]*?\*\//g, '') // Remove CSS comments
.replace(/\s+/g, ' ') // Compress whitespace
.replace(/\s*([{}:;,])\s*/g, '$1') // Whitespace around symbols
.replace(/;\}/g, '}') // Last semicolon optional
.trim()
},
js: (code: string) => {
return code
.replace(/\/\*[\s\S]*?\*\//g, '') // Multi-line comments
.replace(/\/\/.*$/gm, '') // Single-line comments
.replace(/\s+/g, ' ') // Compress whitespace
.replace(/\s*([{}();,:])\s*/g, '$1') // Whitespace around symbols
.trim()
}
}
Regex Pitfalls#
Looks simple, but there are several gotchas:
1. “Fake Comments” Inside Strings
const str = "/* This is not a comment */"
const url = "http://example.com" // The // here is not a comment either
Regex will mistakenly delete these. The solution is to extract strings first, process, then restore:
function safeMinifyJS(code: string) {
const strings: string[] = []
// Replace strings with placeholders first
let protected = code.replace(/(["'`])(?:(?!\1)[^\\]|\\.)*\1/g, (match) => {
strings.push(match)
return `__STRING_${strings.length - 1}__`
})
// Now safe to remove comments
protected = protected
.replace(/\/\*[\s\S]*?\*\//g, '')
.replace(/\/\/.*$/gm, '')
// Restore strings
protected = protected.replace(/__STRING_(\d+)__/g, (_, i) => strings[i])
return protected
}
2. Regex Literal Edge Cases
const regex = /\/\/ This is not a comment either \/*/
The // and /* inside regex literals aren’t comments either. Same approach: protect them first.
3. Template String Complexity
const tpl = `
multi-line
content
${/* This IS a comment */ 'value'}
`
Template strings can nest expressions, and expressions can have comments… This gets complex. Regex only handles simple cases.
AST Approach: The Professional Choice#
Professional minifiers (Terser, UglifyJS, esbuild) all use AST. The core pipeline:
Source Code → Lexical Analysis → Token Stream → Syntax Analysis → AST → Transform → Minified AST → Code Generation
Variable Shortening Implementation#
// Original code
function calculateTotal(price, quantity) {
const tax = 0.1
const subtotal = price * quantity
const total = subtotal * (1 + tax)
return total
}
// Minified
function calculateTotal(a, b) {
const c = 0.1
const d = a * b
return d * (1 + c)
}
Implementation approach:
import { parse } from '@babel/parser'
import traverse from '@babel/traverse'
import generate from '@babel/generator'
function minifyWithAST(code: string) {
const ast = parse(code)
// Collect all variable names
const bindings = new Map<string, string>()
let counter = 0
// Generate short variable names
const getShortName = () => {
const name = base54(counter)
counter++
return name
}
// base54 encoding: a-z, A-Z, 0-9 (digits can't start)
function base54(num: number): string {
const chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789$_'
let result = ''
do {
result = chars[num % 54] + result
num = Math.floor(num / 54)
} while (num > 0)
return result
}
// Traverse AST, rename variables
traverse(ast, {
VariableDeclarator(path) {
const oldName = path.node.id.name
if (!bindings.has(oldName)) {
bindings.set(oldName, getShortName())
}
path.scope.rename(oldName, bindings.get(oldName))
}
})
return generate(ast, { compact: true }).code
}
Dead Code Elimination#
// Original code
if (false) {
console.log('Never executes')
}
const unused = 'This variable is never used'
// Minified
// Nothing left
Implementation approach:
traverse(ast, {
IfStatement(path) {
// If condition is false, remove entire if block
if (path.node.test.type === 'BooleanLiteral' && !path.node.test.value) {
path.remove()
}
},
VariableDeclarator(path) {
// Check if variable is referenced
const binding = path.scope.getBinding(path.node.id.name)
if (!binding.referenced) {
path.remove()
}
}
})
Performance Comparison: Regex vs AST#
I ran a simple test, minifying a 100KB JS file:
| Approach | Time | Compression Rate |
|---|---|---|
| Regex | 5ms | 35% |
| Terser | 150ms | 65% |
| esbuild | 8ms | 60% |
Regex is fastest but lowest compression, Terser has highest compression but slowest, esbuild uses Go-based AST implementation balancing speed and compression.
Choosing in Practice#
Development: Speed First#
Frequent builds in development, use regex or esbuild:
// vite.config.js
export default {
build: {
minify: 'esbuild' // Default, fast
}
}
Production: Compression First#
Production builds use Terser, spending more time for smaller size:
// vite.config.js
export default {
build: {
minify: 'terser',
terserOptions: {
compress: {
drop_console: true, // Remove console
drop_debugger: true // Remove debugger
}
}
}
}
Online Tools: Regex Approach#
Online minification tools need instant response, regex suffices:
// Real-time minification, respond immediately on input
const debouncedMinify = debounce((code: string) => {
const minified = minifiers.js(code)
setOutput(minified)
setStats({
original: code.length,
minified: minified.length,
saved: code.length - minified.length
})
}, 300)
A Complete Online Minification Tool#
Based on these principles, I built an online code minifier supporting HTML/CSS/JS: Code Minifier
Key features:
- Support for HTML, CSS, JavaScript
- Real-time size comparison before/after minification
- One-click copy or download minified result
- Pure frontend implementation, code never leaves your browser
The core code is the regex implementation above, plus some edge case handling. While compression rate isn’t as good as professional tools, it’s sufficient for quick minification needs.
Summary#
Code minification has three levels of complexity:
- Regex approach: Remove whitespace and comments, simple implementation, fast, 30-40% compression
- Lightweight AST: Add variable shortening and dead code elimination, 50-60% compression
- Professional tools: Terser/esbuild, 60-70% compression, advanced features
Choose based on your scenario. Development prioritizes speed with regex or esbuild, production prioritizes size with Terser, online tools prioritize instant response with regex.
Related Tools: Code Formatter | JSON Compress