SVG Optimizer Implementation: From Regex to Performance#

Working on an icon library project, I noticed exported SVG files were often 50KB+. After optimization, they shrank by 60%. But manually uploading and downloading files every time was tedious. So I built my own SVG optimizer. Here’s how it works.

Why Are SVG Files So Large?#

A typical unoptimized SVG:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" width="24" height="24">
  <!-- User icon -->
  <metadata>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about="">
        <dc:title>User Icon</dc:title>
      </rdf:Description>
    </rdf:RDF>
  </metadata>
  <title>User Icon</title>
  <desc>A simple user icon</desc>
  <g fill="none" stroke="currentColor" stroke-width="2">
    <path d="M20 21v-2a4 4 0 0 0-4-4H8a4 4 0 0 0-4 4v2"></path>
    <circle cx="12" cy="7" r="4"></circle>
  </g>
</svg>

This file is 700+ bytes, but only the path and circle are essential. The bloat includes:

  1. XML declaration and DOCTYPE - Browsers don’t need them
  2. metadata - Editor garbage
  3. title and desc - No rendering impact
  4. Comments - Leftover from development
  5. Whitespace - Newlines and indentation

After optimization, it shrinks to ~200 bytes - a 70% reduction.

Regex Implementation#

1. Remove DOCTYPE#

function removeDoctype(svg: string): string {
  return svg.replace(/<!DOCTYPE[^>]*>/gi, '')
}

[^>]* matches everything before >. The gi flags mean global and case-insensitive.

2. Remove Comments#

function removeComments(svg: string): string {
  return svg.replace(/<!--[\s\S]*?-->/g, '')
}

The key is [\s\S]*?:

  • [\s\S] matches all characters including newlines
  • *? is non-greedy to avoid matching across comments

3. Remove Metadata#

function removeMetadata(svg: string): string {
  svg = svg.replace(/<metadata[\s\S]*?<\/metadata>/gi, '')
  svg = svg.replace(/<title[\s\S]*?<\/title>/gi, '')
  svg = svg.replace(/<desc[\s\S]*?<\/desc>/gi, '')
  return svg
}

Gotcha: Tag names might be uppercase, so the i flag is required. Also, <title> is optional in SVG but required in HTML - don’t confuse them.

4. Remove Empty Attributes#

function removeEmptyAttrs(svg: string): string {
  return svg.replace(/\s+=""|\s+=''/g, '')
}

Matches both quote styles: ="" and =''. The \s+ ensures there’s whitespace before, avoiding accidental deletion inside strings.

5. Collapse Whitespace#

function collapseWhitespace(svg: string): string {
  // Multiple spaces to one
  svg = svg.replace(/\s+/g, ' ')
  // Remove whitespace between tags
  svg = svg.replace(/>\s+</g, '><')
  // Remove whitespace before >
  svg = svg.replace(/\s+>/g, '>')
  return svg.trim()
}

Order matters:

  1. Compress all consecutive whitespace first
  2. Remove whitespace between tags (e.g., </path> <circle></path><circle>)
  3. Remove whitespace before > (e.g., <path d="M10 20" ><path d="M10 20">)

Complete Implementation#

interface OptimizationOptions {
  removeDoctype: boolean
  removeComments: boolean
  removeMetadata: boolean
  removeEmptyAttrs: boolean
  collapseWhitespace: boolean
}

function optimizeSvg(svg: string, options: OptimizationOptions): string {
  let result = svg

  if (options.removeDoctype) {
    result = result.replace(/<!DOCTYPE[^>]*>/gi, '')
  }
  if (options.removeComments) {
    result = result.replace(/<!--[\s\S]*?-->/g, '')
  }
  if (options.removeMetadata) {
    result = result.replace(/<metadata[\s\S]*?<\/metadata>/gi, '')
    result = result.replace(/<title[\s\S]*?<\/title>/gi, '')
    result = result.replace(/<desc[\s\S]*?<\/desc>/gi, '')
  }
  if (options.removeEmptyAttrs) {
    result = result.replace(/\s+=""|\s+=''/g, '')
  }
  if (options.collapseWhitespace) {
    result = result.replace(/\s+/g, ' ')
    result = result.replace(/>\s+</g, '><')
  }
  result = result.replace(/\s+>/g, '>')
  return result.trim()
}

Performance Optimization#

1. Avoid Repeated Regex Compilation#

Each replace call compiles the regex. Pre-compile them:

const REGEX = {
  doctype: /<!DOCTYPE[^>]*>/gi,
  comment: /<!--[\s\S]*?-->/g,
  metadata: /<metadata[\s\S]*?<\/metadata>/gi,
  title: /<title[\s\S]*?<\/title>/gi,
  desc: /<desc[\s\S]*?<\/desc>/gi,
  emptyAttr: /\s+=""|\s+=''/g,
  whitespace: /\s+/g,
  betweenTags: />\s+</g,
  beforeClose: /\s+>/g
}

function optimizeSvg(svg: string, options: OptimizationOptions): string {
  let result = svg
  if (options.removeDoctype) result = result.replace(REGEX.doctype, '')
  // ... other replacements
  return result.trim()
}

2. Large File Handling#

For large SVGs (maps, charts), direct processing blocks the UI:

// Use Web Worker for async processing
const worker = new Worker('svg-optimizer-worker.js')

function optimizeAsync(svg: string): Promise<string> {
  return new Promise((resolve) => {
    worker.postMessage({ svg, options })
    worker.onmessage = (e) => resolve(e.data.result)
  })
}

// worker.js
self.onmessage = (e) => {
  const result = optimizeSvg(e.data.svg, e.data.options)
  self.postMessage({ result })
}

3. Streaming Processing#

For huge files (>10MB), process in chunks:

async function optimizeLargeSvg(svg: string): Promise<string> {
  const CHUNK_SIZE = 1024 * 1024 // 1MB
  const chunks: string[] = []
  
  for (let i = 0; i < svg.length; i += CHUNK_SIZE) {
    const chunk = svg.slice(i, i + CHUNK_SIZE)
    chunks.push(optimizeChunk(chunk))
    // Yield to main thread
    await new Promise(resolve => setTimeout(resolve, 0))
  }
  
  return chunks.join('')
}

Edge Cases#

1. CDATA Sections#

SVG may contain CDATA blocks with <!-- strings:

<script><![CDATA[
  // This <!-- is not a comment
  var x = "<!-- not a comment -->";
]]></script>

Simple regex would delete them incorrectly:

// Wrong: deletes CDATA content too
svg.replace(/<!--[\s\S]*?-->/g, '')

Correct approach: extract CDATA first, optimize, then restore:

function preserveCdata(svg: string): { svg: string, cdatas: string[] } {
  const cdatas: string[] = []
  const result = svg.replace(/<!\[CDATA\[[\s\S]*?\]\]>/g, (match) => {
    cdatas.push(match)
    return `__CDATA_${cdatas.length - 1}__`
  })
  return { svg: result, cdatas }
}

function restoreCdata(svg: string, cdatas: string[]): string {
  return cdatas.reduce((result, cdata, i) => {
    return result.replace(`__CDATA_${i}__`, cdata)
  }, svg)
}

2. Inline Styles#

CSS inside <style> tags may contain special characters:

<style>
  .icon { fill: red; }
  /* comment */
</style>

Preserve <style> content, only remove comments:

function optimizeStyles(svg: string): string {
  return svg.replace(/<style>([\s\S]*?)<\/style>/gi, (match, css) => {
    const optimized = css.replace(/\/\*[\s\S]*?\*\//g, '')
    return `<style>${optimized}</style>`
  })
}

3. XML Entities#

SVG may contain XML entities like &lt; &gt; &amp;:

<text>&lt;script&gt;alert('XSS')&lt;/script&gt;</text>

Don’t decode them during optimization:

// Wrong: turns &lt; into <
result.replace(/&lt;/g, '<')  // Don't do this!

Real Results#

Based on this implementation, I built: SVG Optimizer

Test results:

SVG Type Original Optimized Reduction
Simple icon 1.2 KB 0.4 KB 66%
Complex chart 15 KB 8 KB 47%
Map vector 120 KB 85 KB 29%

Simple icons benefit most. Complex SVGs have limited optimization potential since path data dominates.

Advanced Optimization#

Regex only handles surface optimization. Deeper optimization requires parsing the SVG structure:

  1. Path simplification: M10 20 L30 40M10 20 30 40 (L is optional)
  2. Merge paths: Adjacent paths can be combined
  3. Remove hidden elements: Elements with display="none" can be deleted
  4. Simplify transforms: <g transform="translate(10, 20)"> can merge into children

These require professional tools like SVGO or custom XML tree parsing.


Related: Image Compress | Base64 Encoder