Browser-Side OCR with Tesseract.js: A WebAssembly Text Recognition Journey#

I recently needed to add image-to-text functionality to a project. Instead of calling a backend OCR API, I decided to go with a pure frontend solution for privacy and offline support. After diving into Tesseract.js, here’s what I learned.

How OCR Actually Works#

OCR (Optical Character Recognition) transforms image pixels into text. The traditional pipeline:

Preprocessing: Grayscale, binarization, noise reduction, skew correction
Text detection: Find text regions in the image
Character segmentation: Split continuous text into individual characters
Feature extraction: Extract feature vectors for each character
Character recognition: Match against trained models

Tesseract is Google’s open-source OCR engine, originally written in C++. Tesseract.js compiles it to WebAssembly, letting it run directly in browsers.

Getting Started with Tesseract.js#

The core is just a few lines:

import Tesseract from 'tesseract.js'

async function recognizeText(image: string, language: string) {
  // Create worker, load language pack
  const worker = await Tesseract.createWorker(language, 1, {
    logger: (m) => {
      if (m.status === 'recognizing text') {
        console.log(`Progress: ${Math.round(m.progress * 100)}%`)
      }
    }
  })

  // Run recognition
  const result = await worker.recognize(image)

  // Always clean up
  await worker.terminate()

  return result.data.text
}

The second argument 1 is the OEM (OCR Engine Mode). 1 means the LSTM neural network engine, which has better accuracy than the legacy engine.

Multi-Language Support and Language Pack Loading#

Tesseract supports 100+ languages, each with its own trained data. The first time you use a language, it downloads the pack:

const LANGUAGES = [
  { value: 'eng', label: 'English', size: '~13MB' },
  { value: 'chi_sim', label: 'Simplified Chinese', size: '~20MB' },
  { value: 'jpn', label: 'Japanese', size: '~14MB' },
]

Chinese language packs are ~20MB, which can be slow on first load. The fix:

const worker = await Tesseract.createWorker('chi_sim', 1, {
  // Custom CDN for language packs
  langPath: 'https://tessdata.projectnaptha.com/4.0.0',

  // Progress callback
  logger: (m) => {
    if (m.status === 'loading language traineddata') {
      setProgress(Math.round(m.progress * 100))
      setStatus(`Downloading Chinese language pack... ${Math.round(m.progress * 100)}%`)
    }
  }
})

Progress Callback Deep Dive#

Tesseract.js exposes full processing status through the logger callback:

logger: (m) => {
  console.log(m)
  // Possible m.status values:
  // - 'loading tesseract core' - Loading WASM core
  // - 'initializing tesseract' - Engine initialization
  // - 'loading language traineddata' - Downloading language pack
  // - 'initializing api' - API setup
  // - 'recognizing text' - Recognition in progress
}

This design is great for user feedback:

const worker = await Tesseract.createWorker(language, 1, {
  logger: (m) => {
    switch (m.status) {
      case 'loading language traineddata':
        setStatus(`Downloading language pack...`)
        setProgress(m.progress * 30) // First 30% is download
        break
      case 'recognizing text':
        setStatus(`Recognizing...`)
        setProgress(30 + m.progress * 70) // Last 70% is recognition
        break
    }
  }
})

WebAssembly Performance Tips#

Tesseract.js runs a WebAssembly-compiled Tesseract engine. WASM is 10-100x faster than pure JS, but there’s still room for optimization:

1. Worker Reuse#

Creating a worker is slow. Don’t recreate it for each recognition:

// Wrong: Creating new worker each time
async function recognizeEachTime(images: string[]) {
  const results = []
  for (const img of images) {
    const worker = await Tesseract.createWorker('eng', 1)
    const result = await worker.recognize(img)
    results.push(result.data.text)
    await worker.terminate() // Slow to terminate and recreate
  }
  return results
}

// Right: Reuse the worker
let workerPromise: Promise<Tesseract.Worker> | null = null

async function getWorker() {
  if (!workerPromise) {
    workerPromise = Tesseract.createWorker('eng', 1)
  }
  return workerPromise
}

async function recognizeBatch(images: string[]) {
  const worker = await getWorker()
  const results = []
  for (const img of images) {
    const result = await worker.recognize(img)
    results.push(result.data.text)
  }
  return results
}

2. Image Size Optimization#

Large images are slow. Compress before uploading:

async function compressImage(file: File, maxWidth = 1920): Promise<string> {
  return new Promise((resolve) => {
    const img = new Image()
    img.onload = () => {
      const canvas = document.createElement('canvas')
      const ratio = Math.min(maxWidth / img.width, 1)
      canvas.width = img.width * ratio
      canvas.height = img.height * ratio

      const ctx = canvas.getContext('2d')!
      ctx.drawImage(img, 0, 0, canvas.width, canvas.height)

      resolve(canvas.toDataURL('image/jpeg', 0.8))
    }
    img.src = URL.createObjectURL(file)
  })
}

Accuracy Optimization Tips#

Several factors affect recognition accuracy:

1. Image Preprocessing#

// Binarization for better contrast
function binarize(canvas: HTMLCanvasElement) {
  const ctx = canvas.getContext('2d')!
  const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height)
  const data = imageData.data

  for (let i = 0; i < data.length; i += 4) {
    const gray = 0.299 * data[i] + 0.587 * data[i+1] + 0.114 * data[i+2]
    const threshold = gray > 128 ? 255 : 0
    data[i] = data[i+1] = data[i+2] = threshold
  }

  ctx.putImageData(imageData, 0, 0)
}

2. Language Selection#

For mixed Chinese-English content:

// Option 1: Multi-language recognition
const worker = await Tesseract.createWorker('chi_sim+eng', 1)

// Option 2: Detect language and retry if needed

3. PSM Mode Selection#

Tesseract has multiple Page Segmentation Modes (PSM):

await worker.setParameters({
  tessedit_pageseg_mode: Tesseract.PSM.AUTO           // Auto-detect
  // PSM.SINGLE_BLOCK      - Single uniform text block
  // PSM.SINGLE_LINE       - Single text line
  // PSM.SINGLE_WORD       - Single word
  // PSM.SINGLE_CHAR       - Single character
})

If you know the image contains a single line, PSM.SINGLE_LINE is faster and more accurate.

Edge Cases to Handle#

1. Memory Leaks#

Tesseract workers consume significant memory. Always clean up:

useEffect(() => {
  let worker: Tesseract.Worker | null = null

  return () => {
    if (worker) {
      worker.terminate() // Clean up on unmount
    }
  }
}, [])

2. Cancellation#

Long recognition tasks need cancellation support:

const abortRef = useRef(false)

async function recognize(image: string) {
  abortRef.current = false

  const worker = await Tesseract.createWorker('chi_sim', 1, {
    logger: (m) => {
      if (abortRef.current) {
        throw new Error('Recognition cancelled')
      }
    }
  })
  // ...
}

function cancel() {
  abortRef.current = true
}

3. Mobile Performance#

Low-end phones may struggle with WASM:

const isLowEnd = navigator.hardwareConcurrency < 4

if (isLowEnd) {
  // Warn user about longer processing time
  // Or limit image dimensions
}

The Result#

Based on these implementations, I built: OCR Text Recognition

Features:

9 languages (Chinese, English, Japanese, Korean, French, German, Russian, Spanish)
Real-time progress display
Editable results with copy/download
Automatic character and word count

Frontend OCR may not match commercial APIs like Google Vision for accuracy, but for simple use cases it’s perfectly adequate—and data never leaves the browser, which is great for privacy.