Building a GIF Maker from Scratch: LZW Compression and Median Cut Color Quantization#

Written: 2026-04-30 12:39

Recently I added a GIF maker to JsonKit. I initially thought I’d just use an existing library, but implementing it from scratch turned out to be more interesting. This article explores the core technologies behind the GIF format: LZW compression and Median Cut color quantization.

Why is GIF So Tricky?#

The GIF format was born in 1987, and its design philosophy is completely different from modern formats. The biggest pitfall: GIF only supports 256 colors. Modern images have millions of colors, and squeezing them into a 256-color palette while maintaining quality is the challenge of color quantization.

Another issue is LZW compression. GIF’s LZW algorithm was patent-protected (expired in 2003), but more importantly, the implementation details are easy to get wrong. What happens when the code table overflows? When should you send a Clear Code? These details are vaguely documented and require experimentation to figure out.

Median Cut: Reducing Millions of Colors to 256#

Median Cut is a classic color quantization algorithm. The core idea is simple: divide the color space into 256 blocks, and use the center point of each block as the representative color.

Algorithm Steps#

function medianCut(pixels, maxColors) {
  // 1. Count all colors and their frequencies
  const colorMap = new Map()
  for (let i = 0; i < pixels.length; i += 4) {
    const key = `${pixels[i]},${pixels[i+1]},${pixels[i+2]}`
    colorMap.set(key, (colorMap.get(key) || 0) + 1)
  }
  
  // 2. Initialize: put all colors in one bucket
  const buckets = [{
    entries: Array.from(colorMap.entries()),
    rMin: 0, rMax: 255,
    gMin: 0, gMax: 255,
    bMin: 0, bMax: 255
  }]
  
  // 3. Keep splitting until we have maxColors buckets
  while (buckets.length < maxColors) {
    // Find the bucket with the widest range
    const bucket = findWidestBucket(buckets)
    if (!bucket) break
    
    // Split along the longest dimension
    const [left, right] = splitBucket(bucket)
    buckets.splice(buckets.indexOf(bucket), 1, left, right)
  }
  
  // 4. Take weighted average color from each bucket as palette
  return buckets.map(b => computeAverage(b))
}

Key Details#

Why weight by frequency? Colors that appear more often have a greater visual impact. If a pixel only appears once, it doesn’t matter much if it’s quantized incorrectly. But if the background color is off, the whole image looks bad.

Splitting strategy: Standard Median Cut splits evenly by color count, but a better approach is to split by pixel count. This avoids stuffing millions of pixels in one bucket while another bucket only has a few pixels.

LZW Compression: The Soul of GIF#

LZW is a dictionary-based encoding algorithm. The core idea is to replace repeated strings with short codes. GIF’s LZW has several special characteristics:

1. Variable-Length Codes#

LZW code length grows dynamically. The initial code size is minCodeSize + 1, and as the dictionary grows, the code length increases to 12 bits. Once the dictionary is full (4096 entries), a Clear Code is sent to reset the dictionary.

function packBits(codes, minCodeSize) {
  const clearCode = 1 << minCodeSize
  const endCode = clearCode + 1
  
  let codeSize = minCodeSize + 1
  let nextCode = endCode + 1
  
  for (const code of codes) {
    // Pack the code into the bit buffer
    bitBuffer |= (code << bitCount)
    bitCount += codeSize
    
    // Output a byte every 8 bits
    while (bitCount >= 8) {
      bytes.push(bitBuffer & 0xFF)
      bitBuffer >>>= 8
      bitCount -= 8
    }
    
    // Dictionary full, send Clear Code
    if (nextCode >= 4096) {
      codes.push(clearCode)
      nextCode = endCode + 1
      codeSize = minCodeSize + 1
    }
    // Increase code size
    else if (nextCode >= (1 << codeSize) && codeSize < 12) {
      codeSize++
    }
  }
}

2. Dictionary Construction#

The LZW dictionary is built during compression. Each time a code is output, “current code + next pixel” is added to the dictionary.

function lzwEncode(indexedPixels, minCodeSize) {
  const dict = new Map()
  let w = indexedPixels[0]
  
  for (let i = 1; i < indexedPixels.length; i++) {
    const k = indexedPixels[i]
    
    // w+k is in dictionary, keep extending
    if (dict.get(w)?.has(k)) {
      w = dict.get(w).get(k)
    }
    // w+k not in dictionary, output w, add w+k to dictionary
    else {
      codes.push(w)
      if (!dict.has(w)) dict.set(w, new Map())
      dict.get(w).set(k, nextCode++)
      w = k
    }
  }
  
  codes.push(w)
  return codes
}

3. Performance Optimization#

The original LZW implementation uses w+k as dictionary keys, and string concatenation is slow. The optimization is to use nested Maps: the first-level Map’s key is the prefix code, and the second-level Map’s key is the suffix pixel value. This reduces lookup from O(n) to O(1).

GIF File Structure#

GIF files are organized into blocks, mainly containing:

GIF89a Header
Logical Screen Descriptor
Netscape Application Extension (looping)
┌─ Graphics Control Extension (delay, transparency)
│  Image Descriptor
│  Local Color Table
│  LZW Image Data
└─ (repeat for each frame)
GIF Trailer

The Delay Time Pitfall#

GIF delay units are 10 milliseconds, not 1 millisecond. Also, many players enforce a minimum delay of 20ms (2 units), so if you set 10ms, it might actually play at 20ms.

Looping#

The GIF89a standard itself doesn’t support looping - it was added by Netscape as an extension. A loop count of 0 means infinite loop, 1 means play once (total of 2 plays). This pitfall took me a while to figure out.

Practical Experience#

When building JsonKit’s GIF maker, I encountered several performance issues:

Color quantization is slow: The original implementation iterates through all pixels to find the nearest color, O(width × height × paletteSize). Optimization: use KD-Tree or pre-build a color lookup table.

LZW compression is slow: Dictionary lookup is the bottleneck. Using nested Maps instead of string concatenation improved speed by 10x.

Memory explosion: When processing large images, Canvas’s getImageData returns huge arrays. Solution: process in chunks, or use Web Workers to avoid blocking the main thread.

Conclusion#

Although the GIF format is ancient, the technology behind it is fascinating. LZW compression is a classic early lossless compression algorithm, and Median Cut is the foundation of color quantization. Understanding these principles not only helps you write better GIF tools, but also aids in understanding modern image formats (WebP, AVIF).

The complete code is in JsonKit’s GIF Maker tool - feel free to try it out.