Base64 Encoding/Decoding: From btoa to Full Unicode Support#

I recently worked on a project requiring safe text data transmission, and Base64 encoding was unavoidable. While I’ve used it countless times, implementing a reliable tool revealed some interesting details worth sharing.

What is Base64?#

Simply put, Base64 is a method to represent binary data using 64 printable characters:

  • Uppercase letters A-Z (26 characters)
  • Lowercase letters a-z (26 characters)
  • Digits 0-9 (10 characters)
  • + and / (2 characters)

Plus = for padding at the end.

Why use Base64? Many protocols (like SMTP for email) can only transmit ASCII characters. Binary data sent directly would be corrupted or lost. Base64 converts binary to ASCII, ensuring safe transmission.

Encoding Principle: 3 Bytes to 4 Characters#

The core algorithm splits every 3 bytes (24 bits) of data into 4 groups of 6 bits. Each 6-bit value maps to a Base64 character.

Original: M        a        n
ASCII:    77       97       110
Binary:   01001101 01100001 01101110

Groups:   010011 010110 000101 101110
Decimal:   19      22      5      46
Char:      T       W       F       u

So “Man” becomes “TWFu”.

If data isn’t a multiple of 3, pad with zeros and add =:

  • 1 byte remaining: add 2 =
  • 2 bytes remaining: add 1 =

JavaScript Implementation: The btoa Trap#

Browsers provide native btoa and atob functions:

const encoded = btoa('Hello')  // "SGVsbG8="
const decoded = atob('SGVsbG8=')  // "Hello"

But btoa has a major pitfall: it only supports Latin1 characters (single-byte). Chinese characters throw an error:

btoa('你好')  // Uncaught DOMException: Failed to execute 'btoa'

The Unicode Solution#

The standard approach is to first convert UTF-8 characters to percent-encoding using encodeURIComponent:

function base64Encode(str: string): string {
  return btoa(unescape(encodeURIComponent(str)))
}

function base64Decode(base64: string): string {
  return decodeURIComponent(escape(atob(base64)))
}

How does this work? encodeURIComponent converts each UTF-8 byte to %XX format, making the entire string ASCII-safe:

encodeURIComponent('你好')
// "%E4%BD%A0%E5%A5%BD"

// Each %XX is a Latin1 character
unescape('%E4%BD%A0%E5%A5%BD')
// "你好" (4 Latin1 characters, safe for btoa)

Building the Tool#

Based on these principles, here’s a complete Base64 encoder/decoder:

import { useState, useCallback } from 'react'

export default function Base64Tool() {
  const [input, setInput] = useState('')
  const [output, setOutput] = useState('')
  const [mode, setMode] = useState<'encode' | 'decode'>('encode')
  const [error, setError] = useState('')

  const handleEncode = useCallback(() => {
    if (!input.trim()) {
      setError('Please enter text to encode')
      return
    }

    try {
      // UTF-8 safe encoding
      const encoded = btoa(unescape(encodeURIComponent(input)))
      setOutput(encoded)
      setError('')
    } catch (e) {
      setError('Encoding failed: invalid characters')
    }
  }, [input])

  const handleDecode = useCallback(() => {
    if (!input.trim()) {
      setError('Please enter Base64 text to decode')
      return
    }

    try {
      // UTF-8 safe decoding
      const decoded = decodeURIComponent(escape(atob(input)))
      setOutput(decoded)
      setError('')
    } catch (e) {
      setError('Decoding failed: invalid Base64 string')
    }
  }, [input])

  const handleConvert = useCallback(() => {
    mode === 'encode' ? handleEncode() : handleDecode()
  }, [mode, handleEncode, handleDecode])

  return (
    <div className="space-y-4">
      {/* Input */}
      <textarea
        value={input}
        onChange={(e) => setInput(e.target.value)}
        placeholder={mode === 'encode' ? 'Enter text to encode...' : 'Enter Base64 to decode...'}
        className="w-full h-64 p-4 bg-gray-800 rounded-lg"
      />

      {/* Mode toggle */}
      <div className="flex gap-3">
        <button onClick={() => setMode('encode')} className={mode === 'encode' ? 'active' : ''}>
          Encode
        </button>
        <button onClick={() => setMode('decode')} className={mode === 'decode' ? 'active' : ''}>
          Decode
        </button>
      </div>

      {/* Action */}
      <button onClick={handleConvert}>
        {mode === 'encode' ? 'Encode' : 'Decode'}
      </button>

      {/* Error */}
      {error && <div className="text-red-400">{error}</div>}

      {/* Output */}
      <textarea value={output} readOnly className="w-full h-64 p-4 bg-gray-800 rounded-lg" />
    </div>
  )
}

Performance: Handling Large Files#

For large text, synchronous btoa blocks the UI. Use a Web Worker:

// worker.ts
self.onmessage = (e) => {
  const { text, mode } = e.data
  try {
    const result = mode === 'encode'
      ? btoa(unescape(encodeURIComponent(text)))
      : decodeURIComponent(escape(atob(text)))
    self.postMessage({ success: true, result })
  } catch (e) {
    self.postMessage({ success: false, error: e.message })
  }
}

Or use useMemo with debounce in React:

import { useMemo } from 'react'
import { debounce } from 'lodash-es'

const debouncedEncode = useMemo(
  () => debounce((text: string) => {
    try {
      const result = btoa(unescape(encodeURIComponent(text)))
      setOutput(result)
    } catch (e) {
      setError(e.message)
    }
  }, 300),
  []
)

Edge Cases#

1. Invalid Base64 Input#

Base64 has strict format requirements:

  • Length must be a multiple of 4
  • Only Base64 characters and = allowed
  • = can only appear at the end
function isValidBase64(str: string): boolean {
  // Check length
  if (str.length % 4 !== 0) return false

  // Check character set
  const base64Regex = /^[A-Za-z0-9+/]+={0,2}$/
  if (!base64Regex.test(str)) return false

  // Check = position
  const paddingIndex = str.indexOf('=')
  if (paddingIndex !== -1 && paddingIndex < str.length - 2) return false

  return true
}

2. URL-Safe Base64#

Standard Base64 contains + and /, which need escaping in URLs. There’s a variant called “URL Safe Base64” that replaces them:

function toUrlSafe(base64: string): string {
  return base64.replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '')
}

function fromUrlSafe(urlSafe: string): string {
  let base64 = urlSafe.replace(/-/g, '+').replace(/_/g, '/')
  // Add padding
  while (base64.length % 4 !== 0) {
    base64 += '='
  }
  return base64
}

Real-World Use Cases#

  1. Data URLs: Embed small images data:image/png;base64,...
  2. JWT Tokens: Header and Payload are Base64-encoded
  3. Email Attachments: MIME uses Base64 for binary data
  4. API Keys: Basic Auth uses base64(user:pass) format

The Result#

Based on this implementation, I built: Base64 Encoder/Decoder

Features:

  • Full UTF-8 support (Chinese, English, emojis)
  • Bidirectional encoding/decoding
  • File upload support
  • Friendly error messages

The code isn’t complex, but Unicode handling is the key. Hope this helps.


Related: URL Encoder/Decoder | JWT Debugger