The Algorithm Behind Lorem Ipsum Generators: From Latin to Chinese Random Text
The Algorithm Behind Lorem Ipsum Generators: From Latin to Chinese Random Text#
When mocking up designs, we often need placeholder text. Photoshop has Lorem Ipsum built in, and frontend developers need dummy text for layout testing. Today let’s explore how Lorem Ipsum generators work under the hood, and dive into the details of random algorithms.
The Origin of Lorem Ipsum#
Here’s a fun fact: Lorem Ipsum isn’t meaningless gibberish. It comes from Cicero’s “De Finibus Bonorum et Malorum” written in 45 BC. The original text reads:
“Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit…”
Which translates to: “There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain.”
In the 1500s, an unknown printer scrambled this passage to make a type specimen book. Since then, Lorem Ipsum became the printing industry’s standard placeholder text, still in use today.
Three Approaches to Random Text Generation#
Approach 1: Bag of Words Concatenation#
The simplest implementation—prepare a word pool, randomly pick and concatenate:
const LOREM_WORDS = [
'lorem', 'ipsum', 'dolor', 'sit', 'amet', 'consectetur',
'adipiscing', 'elit', 'sed', 'do', 'eiusmod', 'tempor'
]
function generateSentence(wordCount: number): string {
const words: string[] = []
for (let i = 0; i < wordCount; i++) {
const randomIndex = Math.floor(Math.random() * LOREM_WORDS.length)
words.push(LOREM_WORDS[randomIndex])
}
// Capitalize first letter
if (words.length > 0) {
words[0] = words[0].charAt(0).toUpperCase() + words[0].slice(1)
}
return words.join(' ') + '.'
}
This approach is straightforward, but has a flaw: consecutive calls to Math.random() cause some words to be picked frequently while others barely appear.
Approach 2: Fisher-Yates Shuffle + Sliding Window#
A more even distribution comes from Fisher-Yates shuffle:
function shuffleWords(words: string[]): string[] {
const arr = [...words]
for (let i = arr.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1))
;[arr[i], arr[j]] = [arr[j], arr[i]]
}
return arr
}
class LoremGenerator {
private shuffledWords: string[] = []
private cursor = 0
constructor(private wordPool: string[]) {
this.reshuffle()
}
private reshuffle() {
this.shuffledWords = shuffleWords(this.wordPool)
this.cursor = 0
}
nextWord(): string {
if (this.cursor >= this.shuffledWords.length) {
this.reshuffle()
}
return this.shuffledWords[this.cursor++]
}
}
This guarantees each word appears exactly once per cycle, producing more evenly distributed text.
Approach 3: Markov Chains#
For more “natural” looking text, use Markov chains. The principle is to analyze word transitions in real text, then predict the next word based on the previous one.
type Chain = Map<string, string[]>
function buildChain(text: string): Chain {
const words = text.split(/\s+/)
const chain: Chain = new Map()
for (let i = 0; i < words.length - 1; i++) {
const current = words[i]
const next = words[i + 1]
if (!chain.has(current)) {
chain.set(current, [])
}
chain.get(current)!.push(next)
}
return chain
}
function generateMarkov(chain: Chain, startWord: string, length: number): string {
const result: string[] = [startWord]
let current = startWord
for (let i = 1; i < length; i++) {
const nextWords = chain.get(current)
if (!nextWords || nextWords.length === 0) break
const next = nextWords[Math.floor(Math.random() * nextWords.length)]
result.push(next)
current = next
}
return result.join(' ')
}
Markov chains produce text that looks more like real sentences because it respects word transition patterns.
Implementing Chinese Placeholder Text#
Chinese differs from Latin scripts—there are no spaces between words. Randomly concatenating characters produces unreadable gibberish.
Two approaches:
1. Random Character Pool Combination#
const CHINESE_CHARS = [
'的', '一', '是', '在', '不', '了', '有', '和', '人', '这'
]
function generateChinese(length: number): string {
const chars: string[] = []
for (let i = 0; i < length; i++) {
const idx = Math.floor(Math.random() * CHINESE_CHARS.length)
chars.push(CHINESE_CHARS[idx])
}
return chars.join('') + '。'
}
This produces “pseudo-Chinese” that looks like Chinese visually but carries no actual meaning.
2. Weighted Random Based on Word Frequency#
For more realistic Chinese text, weight by frequency:
const CHINESE_WORDS = [
{ word: '的', weight: 100 },
{ word: '是', weight: 50 },
{ word: '在', weight: 40 },
{ word: '我们', weight: 30 },
// ...
]
function weightedRandom(items: { word: string; weight: number }[]): string {
const totalWeight = items.reduce((sum, item) => sum + item.weight, 0)
let random = Math.random() * totalWeight
for (const item of items) {
random -= item.weight
if (random <= 0) return item.word
}
return items[items.length - 1].word
}
Paragraph Structure and Sentence Length Variation#
Real text doesn’t have fixed sentence lengths—they follow a normal distribution:
function generateSentenceLength(baseLength: number, variance: number): number {
// Box-Muller transform for normal distribution
const u1 = Math.random()
const u2 = Math.random()
const z = Math.sqrt(-2 * Math.log(u1)) * Math.cos(2 * Math.PI * u2)
return Math.round(baseLength + z * variance)
}
// When generating sentences
const sentenceLength = generateSentenceLength(12, 3) // mean 12, std dev 3
Separate paragraphs with \n\n for natural visual rhythm.
Performance Optimization: Large Text Batch Generation#
When generating 10,000 words, string concatenation becomes a bottleneck. The fix:
// ❌ Slow: String concatenation
let text = ''
for (let i = 0; i < 10000; i++) {
text += words[random()] + ' '
}
// ✅ Fast: Array join
const parts: string[] = []
for (let i = 0; i < 10000; i++) {
parts.push(words[random()])
}
const text = parts.join(' ')
Array join is about 10x faster than string += because strings are immutable in JavaScript—each concatenation creates a new string.
The Practical Tool#
Based on these principles, I built a Lorem Ipsum Generator that supports:
- Latin / Chinese dual language
- Configurable paragraphs, sentences per paragraph, words per sentence
- Option to “Start with Lorem Ipsum” for tradition
- Real-time character count, word count, paragraph count
The core code is under 100 lines, but getting random algorithms, text structure, and performance optimization right makes it genuinely useful.
Related tools: Placeholder Image Generator | QR Code Generator