From Regex to AST: Understanding Code Minification Principles#

Recently, I was optimizing a frontend project where the bundled JS file exceeded 2MB. Even with Webpack’s production mode, the size seemed excessive. Diving into code minification principles revealed more complexity than I expected.

What Does Code Minification Actually Do?#

Many think minification just “removes spaces and newlines,” but that’s only the tip of the iceberg. Complete code minification includes:

  1. Whitespace compression: Remove unnecessary spaces, newlines, indentation
  2. Comment removal: Strip all comments (optional: keep license headers)
  3. Variable shortening: Transform userName into a, b, c
  4. Dead code elimination: Remove unreachable code
  5. Statement merging: Combine multiple declarations into one line
  6. Constant folding: var a = 1 + 2 becomes var a = 3

The first two can be done with regex; the rest requires AST (Abstract Syntax Tree) parsing.

Regex Implementation: Simple but Effective#

For quick minification needs, a regex approach works well. The core idea is matching and removing:

const minifiers = {
  html: (code: string) => {
    return code
      .replace(/<!--[\s\S]*?-->/g, '')        // Remove HTML comments
      .replace(/>\s+</g, '><')                // Whitespace between tags
      .replace(/\s+/g, ' ')                   // Multiple spaces to one
      .replace(/\s*([<>{}();,:])\s*/g, '$1')  // Whitespace around symbols
      .trim()
  },
  
  css: (code: string) => {
    return code
      .replace(/\/\*[\s\S]*?\*\//g, '')       // Remove CSS comments
      .replace(/\s+/g, ' ')                   // Compress whitespace
      .replace(/\s*([{}:;,])\s*/g, '$1')      // Whitespace around symbols
      .replace(/;\}/g, '}')                   // Last semicolon optional
      .trim()
  },
  
  js: (code: string) => {
    return code
      .replace(/\/\*[\s\S]*?\*\//g, '')       // Multi-line comments
      .replace(/\/\/.*$/gm, '')               // Single-line comments
      .replace(/\s+/g, ' ')                   // Compress whitespace
      .replace(/\s*([{}();,:])\s*/g, '$1')    // Whitespace around symbols
      .trim()
  }
}

Regex Pitfalls#

Looks simple, but there are several gotchas:

1. “Fake Comments” Inside Strings

const str = "/* This is not a comment */"
const url = "http://example.com"  // The // here is not a comment either

Regex will mistakenly delete these. The solution is to extract strings first, process, then restore:

function safeMinifyJS(code: string) {
  const strings: string[] = []
  
  // Replace strings with placeholders first
  let protected = code.replace(/(["'`])(?:(?!\1)[^\\]|\\.)*\1/g, (match) => {
    strings.push(match)
    return `__STRING_${strings.length - 1}__`
  })
  
  // Now safe to remove comments
  protected = protected
    .replace(/\/\*[\s\S]*?\*\//g, '')
    .replace(/\/\/.*$/gm, '')
  
  // Restore strings
  protected = protected.replace(/__STRING_(\d+)__/g, (_, i) => strings[i])
  
  return protected
}

2. Regex Literal Edge Cases

const regex = /\/\/ This is not a comment either \/*/

The // and /* inside regex literals aren’t comments either. Same approach: protect them first.

3. Template String Complexity

const tpl = `
  multi-line
  content
  ${/* This IS a comment */ 'value'}
`

Template strings can nest expressions, and expressions can have comments… This gets complex. Regex only handles simple cases.

AST Approach: The Professional Choice#

Professional minifiers (Terser, UglifyJS, esbuild) all use AST. The core pipeline:

Source Code → Lexical Analysis → Token Stream → Syntax Analysis → AST → Transform → Minified AST → Code Generation

Variable Shortening Implementation#

// Original code
function calculateTotal(price, quantity) {
  const tax = 0.1
  const subtotal = price * quantity
  const total = subtotal * (1 + tax)
  return total
}

// Minified
function calculateTotal(a, b) {
  const c = 0.1
  const d = a * b
  return d * (1 + c)
}

Implementation approach:

import { parse } from '@babel/parser'
import traverse from '@babel/traverse'
import generate from '@babel/generator'

function minifyWithAST(code: string) {
  const ast = parse(code)
  
  // Collect all variable names
  const bindings = new Map<string, string>()
  let counter = 0
  
  // Generate short variable names
  const getShortName = () => {
    const name = base54(counter)
    counter++
    return name
  }
  
  // base54 encoding: a-z, A-Z, 0-9 (digits can't start)
  function base54(num: number): string {
    const chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789$_'
    let result = ''
    do {
      result = chars[num % 54] + result
      num = Math.floor(num / 54)
    } while (num > 0)
    return result
  }
  
  // Traverse AST, rename variables
  traverse(ast, {
    VariableDeclarator(path) {
      const oldName = path.node.id.name
      if (!bindings.has(oldName)) {
        bindings.set(oldName, getShortName())
      }
      path.scope.rename(oldName, bindings.get(oldName))
    }
  })
  
  return generate(ast, { compact: true }).code
}

Dead Code Elimination#

// Original code
if (false) {
  console.log('Never executes')
}
const unused = 'This variable is never used'

// Minified
// Nothing left

Implementation approach:

traverse(ast, {
  IfStatement(path) {
    // If condition is false, remove entire if block
    if (path.node.test.type === 'BooleanLiteral' && !path.node.test.value) {
      path.remove()
    }
  },
  
  VariableDeclarator(path) {
    // Check if variable is referenced
    const binding = path.scope.getBinding(path.node.id.name)
    if (!binding.referenced) {
      path.remove()
    }
  }
})

Performance Comparison: Regex vs AST#

I ran a simple test, minifying a 100KB JS file:

Approach Time Compression Rate
Regex 5ms 35%
Terser 150ms 65%
esbuild 8ms 60%

Regex is fastest but lowest compression, Terser has highest compression but slowest, esbuild uses Go-based AST implementation balancing speed and compression.

Choosing in Practice#

Development: Speed First#

Frequent builds in development, use regex or esbuild:

// vite.config.js
export default {
  build: {
    minify: 'esbuild'  // Default, fast
  }
}

Production: Compression First#

Production builds use Terser, spending more time for smaller size:

// vite.config.js
export default {
  build: {
    minify: 'terser',
    terserOptions: {
      compress: {
        drop_console: true,  // Remove console
        drop_debugger: true  // Remove debugger
      }
    }
  }
}

Online Tools: Regex Approach#

Online minification tools need instant response, regex suffices:

// Real-time minification, respond immediately on input
const debouncedMinify = debounce((code: string) => {
  const minified = minifiers.js(code)
  setOutput(minified)
  setStats({
    original: code.length,
    minified: minified.length,
    saved: code.length - minified.length
  })
}, 300)

A Complete Online Minification Tool#

Based on these principles, I built an online code minifier supporting HTML/CSS/JS: Code Minifier

Key features:

  • Support for HTML, CSS, JavaScript
  • Real-time size comparison before/after minification
  • One-click copy or download minified result
  • Pure frontend implementation, code never leaves your browser

The core code is the regex implementation above, plus some edge case handling. While compression rate isn’t as good as professional tools, it’s sufficient for quick minification needs.

Summary#

Code minification has three levels of complexity:

  1. Regex approach: Remove whitespace and comments, simple implementation, fast, 30-40% compression
  2. Lightweight AST: Add variable shortening and dead code elimination, 50-60% compression
  3. Professional tools: Terser/esbuild, 60-70% compression, advanced features

Choose based on your scenario. Development prioritizes speed with regex or esbuild, production prioritizes size with Terser, online tools prioritize instant response with regex.


Related Tools: Code Formatter | JSON Compress