Random Data Generator: Mock Data Best Practices and Implementation Details
Random Data Generator: Mock Data Best Practices and Implementation Details#
Creating test data manually is fine for a few records, but generating 100 test entries? That’s torture. During a recent project, I hit a wall: the backend API wasn’t ready, but the frontend needed data for debugging. QA needed edge cases. Demos required realistic user data. So I built a random data generator and dove into mock data best practices.
Why You Need a Random Data Generator?#
A few real scenarios:
- Parallel Frontend Development: Mock data lets frontend work while backend APIs are in progress
- Test Coverage: Unit tests and integration tests need diverse edge cases
- Demo Environments: Can’t expose real user data in client demos
- Performance Testing: Generate thousands of records for load testing
Real data? Privacy issues. Static fixtures? Can’t cover edge cases. Random data hits the sweet spot.
Core Implementation: Data Generation Algorithms#
1. Name Generation#
Different strategies for Chinese vs English:
// English: First Name + Last Name
const EN_FIRST_NAMES = ['James', 'Mary', 'John', 'Patricia', /* ... */]
const EN_LAST_NAMES = ['Smith', 'Johnson', 'Williams', /* ... */]
function generateName(locale: Locale): string {
if (locale === 'chinese') {
return pick(ZH_LAST_NAMES) + pick(ZH_FIRST_NAMES) // 张伟
}
return `${pick(EN_FIRST_NAMES)} ${pick(EN_LAST_NAMES)}` // James Smith
}
Key points:
- Chinese: surname first, given name second. English: opposite
- TOP 20 surnames cover 90%+ of population
pick()is a utility function that randomly selects from an array
2. Email Generation#
Emails must follow format rules and ensure uniqueness:
const DOMAINS = ['gmail.com', 'yahoo.com', 'outlook.com', 'protonmail.com', /* ... */]
function generateEmail(locale: Locale): string {
if (locale === 'chinese') {
const name = pick(ZH_LAST_NAMES) + pick(ZH_FIRST_NAMES)
return `${encodeURIComponent(name)}${randInt(1, 999)}@${pick(DOMAINS)}`
// Result: 张伟123@gmail.com
}
return `${pick(EN_FIRST_NAMES).toLowerCase()}${pick(EN_LAST_NAMES).toLowerCase()}${randInt(1, 999)}@${pick(DOMAINS)}`
// Result: jamesjohnson456@gmail.com
}
Two critical details:
- Chinese names need
encodeURIComponentto escape, otherwise email format is invalid - Random numeric suffix prevents duplicates
3. Phone Numbers#
Phone formats vary wildly by region:
function generatePhone(locale: Locale): string {
if (locale === 'chinese') {
// Chinese mobile: starts with 1, second digit 3-9, 11 digits total
return `1${randInt(3, 9)}${String(Math.random()).substring(2, 11)}`
// Result: 13812345678
}
// US phone: (area code) prefix-suffix
return `(${randInt(200, 999)}) ${randInt(200, 999)}-${randInt(1000, 9999)}`
// Result: (415) 555-1234
}
Chinese mobile number rules:
- First digit is always
1 - Second digit is
3-9(carrier number ranges) - Remaining 9 digits are random
4. UUID Generation#
UUID v4 format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
function generateUUID(): string {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, (c) => {
const r = (Math.random() * 16) | 0
const v = c === 'x' ? r : (r & 0x3) | 0x8
return v.toString(16)
})
}
Key points:
- 13th character is always
4, indicating UUID v4 - 17th character (
y) can only be8/9/a/bto meet spec requirements
Multi-Format Output#
Generated data needs export options for immediate use:
JSON Format#
function formatJSON(records: Record<string, unknown>[]): string {
return JSON.stringify(records, null, 2)
}
CSV Format#
CSV’s gotcha is escaping:
function formatCSV(records: Record<string, unknown>[]): string {
if (records.length === 0) return ''
const headers = Object.keys(records[0])
const lines = [headers.join(',')]
for (const record of records) {
const values = headers.map(h => {
const val = String(record[h] ?? '')
// Escape if contains comma, quote, or newline
return val.includes(',') || val.includes('"') || val.includes('\n')
? `"${val.replace(/"/g, '""')}"` // Double the quotes
: val
})
lines.push(values.join(','))
}
return lines.join('\n')
}
CSV escaping rules:
- When a field contains comma, quote, or newline, wrap it in double quotes
- Double quotes inside the field become
""
SQL INSERT#
function formatSQL(records: Record<string, unknown>[]): string {
const tableName = 'random_data'
const headers = Object.keys(records[0])
const lines: string[] = []
for (const record of records) {
const values = headers.map(h => {
const val = record[h]
// Numbers don't need quotes; strings escape single quotes
return typeof val === 'number' ? String(val) : `'${String(val).replace(/'/g, "''")}'`
})
lines.push(`INSERT INTO ${tableName} (${headers.join(', ')}) VALUES (${values.join(', ')});`)
}
return lines.join('\n')
}
SQL escaping rules:
- Numbers don’t get quotes
- Strings are wrapped in single quotes, internal single quotes become
''
TypeScript Type Inference#
Auto-generate type definitions:
function formatTypeScript(records: Record<string, unknown>[]): string {
const headers = Object.keys(records[0])
const typeMap: Record<string, string> = {}
for (const h of headers) {
const sample = records[0][h]
typeMap[h] = typeof sample === 'number' ? 'number' : 'string'
}
const interfaceFields = headers.map(h => ` ${h}: ${typeMap[h]}`).join('\n')
return `interface RandomData {
${interfaceFields}
}
const data: RandomData[] = ${JSON.stringify(records, null, 2)}`
}
Output:
interface RandomData {
name: string
email: string
phone: string
age: number
}
const data: RandomData[] = [
{ name: "James Smith", email: "jamessmith123@gmail.com", phone: "(415) 555-1234", age: 28 },
// ...
]
Performance Considerations#
Generating 100 records is instant, but what about thousands?
Batch Generation Optimization#
function generateBatch(count: number, types: DataType[], locale: Locale) {
const records: Record<string, unknown>[] = []
for (let i = 0; i < count; i++) {
const record: Record<string, unknown> = {}
for (const type of types) {
record[type] = generateValue(type, locale)
}
records.push(record)
}
return records
}
Time complexity O(n × m), where n is record count and m is field count. Generating 10,000 records takes about 50ms.
Avoiding Duplicates#
Some scenarios require uniqueness:
function generateUniqueUUIDs(count: number): string[] {
const set = new Set<string>()
while (set.size < count) {
set.add(generateUUID())
}
return Array.from(set)
}
UUID collision probability is extremely low (2^122 possibilities), but emails and phones need Set-based deduplication.
Real-World Application#
Based on these principles, I built: Random Data Generator
Features:
- 12 data types (name, email, phone, address, company, URL, IP, date, number, UUID, color, country)
- Chinese and English locale support
- 4 export formats (JSON, CSV, SQL, TypeScript)
- Generate up to 100 records
Perfect for frontend development, QA testing, and product demos.
Summary#
Random data generation looks simple, but making it useful requires attention to detail:
- Regional format differences (phone, address, name)
- Multi-format escaping rules (CSV, SQL)
- Uniqueness guarantees (UUID, email deduplication)
- Performance optimization (batch generation)
Hope this helps. Next time you need test data, use a generator instead of manual entry.
Related: UUID Generator | JSON Formatter