Browser-Side OCR with Tesseract.js: A WebAssembly Text Recognition Journey
Browser-Side OCR with Tesseract.js: A WebAssembly Text Recognition Journey#
I recently needed to add image-to-text functionality to a project. Instead of calling a backend OCR API, I decided to go with a pure frontend solution for privacy and offline support. After diving into Tesseract.js, here’s what I learned.
How OCR Actually Works#
OCR (Optical Character Recognition) transforms image pixels into text. The traditional pipeline:
- Preprocessing: Grayscale, binarization, noise reduction, skew correction
- Text detection: Find text regions in the image
- Character segmentation: Split continuous text into individual characters
- Feature extraction: Extract feature vectors for each character
- Character recognition: Match against trained models
Tesseract is Google’s open-source OCR engine, originally written in C++. Tesseract.js compiles it to WebAssembly, letting it run directly in browsers.
Getting Started with Tesseract.js#
The core is just a few lines:
import Tesseract from 'tesseract.js'
async function recognizeText(image: string, language: string) {
// Create worker, load language pack
const worker = await Tesseract.createWorker(language, 1, {
logger: (m) => {
if (m.status === 'recognizing text') {
console.log(`Progress: ${Math.round(m.progress * 100)}%`)
}
}
})
// Run recognition
const result = await worker.recognize(image)
// Always clean up
await worker.terminate()
return result.data.text
}
The second argument 1 is the OEM (OCR Engine Mode). 1 means the LSTM neural network engine, which has better accuracy than the legacy engine.
Multi-Language Support and Language Pack Loading#
Tesseract supports 100+ languages, each with its own trained data. The first time you use a language, it downloads the pack:
const LANGUAGES = [
{ value: 'eng', label: 'English', size: '~13MB' },
{ value: 'chi_sim', label: 'Simplified Chinese', size: '~20MB' },
{ value: 'jpn', label: 'Japanese', size: '~14MB' },
]
Chinese language packs are ~20MB, which can be slow on first load. The fix:
const worker = await Tesseract.createWorker('chi_sim', 1, {
// Custom CDN for language packs
langPath: 'https://tessdata.projectnaptha.com/4.0.0',
// Progress callback
logger: (m) => {
if (m.status === 'loading language traineddata') {
setProgress(Math.round(m.progress * 100))
setStatus(`Downloading Chinese language pack... ${Math.round(m.progress * 100)}%`)
}
}
})
Progress Callback Deep Dive#
Tesseract.js exposes full processing status through the logger callback:
logger: (m) => {
console.log(m)
// Possible m.status values:
// - 'loading tesseract core' - Loading WASM core
// - 'initializing tesseract' - Engine initialization
// - 'loading language traineddata' - Downloading language pack
// - 'initializing api' - API setup
// - 'recognizing text' - Recognition in progress
}
This design is great for user feedback:
const worker = await Tesseract.createWorker(language, 1, {
logger: (m) => {
switch (m.status) {
case 'loading language traineddata':
setStatus(`Downloading language pack...`)
setProgress(m.progress * 30) // First 30% is download
break
case 'recognizing text':
setStatus(`Recognizing...`)
setProgress(30 + m.progress * 70) // Last 70% is recognition
break
}
}
})
WebAssembly Performance Tips#
Tesseract.js runs a WebAssembly-compiled Tesseract engine. WASM is 10-100x faster than pure JS, but there’s still room for optimization:
1. Worker Reuse#
Creating a worker is slow. Don’t recreate it for each recognition:
// Wrong: Creating new worker each time
async function recognizeEachTime(images: string[]) {
const results = []
for (const img of images) {
const worker = await Tesseract.createWorker('eng', 1)
const result = await worker.recognize(img)
results.push(result.data.text)
await worker.terminate() // Slow to terminate and recreate
}
return results
}
// Right: Reuse the worker
let workerPromise: Promise<Tesseract.Worker> | null = null
async function getWorker() {
if (!workerPromise) {
workerPromise = Tesseract.createWorker('eng', 1)
}
return workerPromise
}
async function recognizeBatch(images: string[]) {
const worker = await getWorker()
const results = []
for (const img of images) {
const result = await worker.recognize(img)
results.push(result.data.text)
}
return results
}
2. Image Size Optimization#
Large images are slow. Compress before uploading:
async function compressImage(file: File, maxWidth = 1920): Promise<string> {
return new Promise((resolve) => {
const img = new Image()
img.onload = () => {
const canvas = document.createElement('canvas')
const ratio = Math.min(maxWidth / img.width, 1)
canvas.width = img.width * ratio
canvas.height = img.height * ratio
const ctx = canvas.getContext('2d')!
ctx.drawImage(img, 0, 0, canvas.width, canvas.height)
resolve(canvas.toDataURL('image/jpeg', 0.8))
}
img.src = URL.createObjectURL(file)
})
}
Accuracy Optimization Tips#
Several factors affect recognition accuracy:
1. Image Preprocessing#
// Binarization for better contrast
function binarize(canvas: HTMLCanvasElement) {
const ctx = canvas.getContext('2d')!
const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height)
const data = imageData.data
for (let i = 0; i < data.length; i += 4) {
const gray = 0.299 * data[i] + 0.587 * data[i+1] + 0.114 * data[i+2]
const threshold = gray > 128 ? 255 : 0
data[i] = data[i+1] = data[i+2] = threshold
}
ctx.putImageData(imageData, 0, 0)
}
2. Language Selection#
For mixed Chinese-English content:
// Option 1: Multi-language recognition
const worker = await Tesseract.createWorker('chi_sim+eng', 1)
// Option 2: Detect language and retry if needed
3. PSM Mode Selection#
Tesseract has multiple Page Segmentation Modes (PSM):
await worker.setParameters({
tessedit_pageseg_mode: Tesseract.PSM.AUTO // Auto-detect
// PSM.SINGLE_BLOCK - Single uniform text block
// PSM.SINGLE_LINE - Single text line
// PSM.SINGLE_WORD - Single word
// PSM.SINGLE_CHAR - Single character
})
If you know the image contains a single line, PSM.SINGLE_LINE is faster and more accurate.
Edge Cases to Handle#
1. Memory Leaks#
Tesseract workers consume significant memory. Always clean up:
useEffect(() => {
let worker: Tesseract.Worker | null = null
return () => {
if (worker) {
worker.terminate() // Clean up on unmount
}
}
}, [])
2. Cancellation#
Long recognition tasks need cancellation support:
const abortRef = useRef(false)
async function recognize(image: string) {
abortRef.current = false
const worker = await Tesseract.createWorker('chi_sim', 1, {
logger: (m) => {
if (abortRef.current) {
throw new Error('Recognition cancelled')
}
}
})
// ...
}
function cancel() {
abortRef.current = true
}
3. Mobile Performance#
Low-end phones may struggle with WASM:
const isLowEnd = navigator.hardwareConcurrency < 4
if (isLowEnd) {
// Warn user about longer processing time
// Or limit image dimensions
}
The Result#
Based on these implementations, I built: OCR Text Recognition
Features:
- 9 languages (Chinese, English, Japanese, Korean, French, German, Russian, Spanish)
- Real-time progress display
- Editable results with copy/download
- Automatic character and word count
Frontend OCR may not match commercial APIs like Google Vision for accuracy, but for simple use cases it’s perfectly adequate—and data never leaves the browser, which is great for privacy.
Related: Image Compressor | Image Cropper