How LLM Token Counting Works: Building a Client-Side Token Estimator
How LLM Token Counting Works: Building a Client-Side Token Estimator#
Every time you call a GPT or Claude API, you’re paying by the token. But what exactly is a token, and how do you estimate token counts without loading a 500MB tokenizer model into your browser?
Let’s break down the algorithm behind the Token Counter tool, and why a simple heuristic can get you surprisingly close to the real count.
What Are Tokens, Really?#
Tokens are how LLMs slice text into digestible pieces. They’re not characters and not full words — something in between.
GPT models use Byte-Pair Encoding (BPE). “Hello world” becomes ["Hello", " world"] — two tokens. But “你好世界” in Chinese, because each character carries more information and UTF-8 encoding is denser, typically gets split into 3-5 tokens depending on the model.
Different models use different tokenizers. GPT-4o, Claude 3.5, and Llama 3 all have slightly different tokenization. This means the same text produces different token counts across models.
The Estimation Algorithm#
Loading a real BPE tokenizer in a browser isn’t practical — tiktoken’s Rust binary is megabytes. Instead, we use a character-level heuristic:
function estimateTokens(text: string): number {
if (!text) return 0
let tokens = 0
for (const char of text) {
const code = char.charCodeAt(0)
if (code >= 0x4e00 && code <= 0x9fff) {
tokens += 2
} else {
tokens += 0.25
}
}
return Math.ceil(tokens)
}
The logic: walk through every character in the string. CJK Unified Ideographs (U+4E00–U+9FFF — the block covering common Chinese, Japanese, and Korean characters) count as 2 tokens each. Everything else counts as 0.25 tokens per character.
Where do these numbers come from?
Chinese characters in GPT’s BPE tokenizer typically consume 1.5 to 2.5 tokens each, so 2 is a reasonable average. For English and other Latin-script text, the empirical rule is roughly 4 characters per token (one token covers about 3/4 of an English word plus the trailing space), hence 0.25 per character.
Is it precise? No. Is it useful? Absolutely. In practice, this estimation stays within 10% of the real token count for most API payloads — good enough for cost estimation and context window planning.
Word Counting Across Scripts#
“Word count” is surprisingly tricky when your text mixes Chinese and English. Simple split(' ') doesn’t work for CJK because there are no spaces between characters.
function countWords(text: string): number {
if (!text.trim()) return 0
const cjkChars = text.match(
/[\u4e00-\u9fff\u3040-\u309f\u30a0-\u30ff\uac00-\ud7af]/g
)
const withoutCjk = text.replace(
/[\u4e00-\u9fff\u3040-\u309f\u30a0-\u30ff\uac00-\ud7af]/g, ' '
)
const latinWords = withoutCjk.trim().split(/\s+/).filter(w => w.length > 0)
return (cjkChars?.length ?? 0) + latinWords.length
}
This function extracts all CJK characters (including Hiragana, Katakana, and Hangul), replaces them with spaces in the original text, then splits by whitespace for the remaining Latin words. The total is CJK characters counted individually plus space-delimited words. This correctly handles mixed-script inputs like “你好 world” as 3 words.
From Tokens to Dollars#
Once you have an estimated token count, calculating cost is straightforward:
const cost = (tokens / 1_000_000) * model.pricePerMillionInput
But the real value comes from comparing across models. GPT-4o costs $2.5 per million input tokens with a 128K context. Gemini 1.5 Flash costs $0.075 per million with a 1M context — that’s 33x cheaper with 8x more room.
The tool visualizes this with a progress bar that turns yellow at 70% context usage and red at 90%. That’s your signal to either trim the prompt or switch to a model with a larger window.
Where It Matters Most#
Cost awareness during development. A good system prompt plus 5-6 few-shot examples can easily run 2,000+ tokens. At GPT-4 Turbo pricing ($10/million input), that’s $0.02 per call before you’ve even sent the actual query. Iterate 50 times during debugging and you’ve burned a dollar on prompts alone.
Context window budgeting. The prompt_tokens + max_tokens sum must stay under the model’s context_window. Going over means silent truncation — your carefully crafted instructions at the end of the prompt simply disappear. The token estimator helps you size prompts before hitting the API.
Document chunking for RAG. When splitting documents for retrieval-augmented generation, you want chunks sized by token count, not character count. A 2,000-token chunk of Chinese text is much shorter in characters than 2,000 tokens of English, because each Chinese token carries more information.
These algorithms aren’t complex, but they save real time and money during LLM development. Try the online tool at Token Counter — it supports 14 models with live cost estimation and context window tracking.
Related tools: JSON Formatter | JWT Decoder | Regex Tester