Random Data Generator: Mock Data Best Practices and Implementation Details#

Creating test data manually is fine for a few records, but generating 100 test entries? That’s torture. During a recent project, I hit a wall: the backend API wasn’t ready, but the frontend needed data for debugging. QA needed edge cases. Demos required realistic user data. So I built a random data generator and dove into mock data best practices.

Why You Need a Random Data Generator?#

A few real scenarios:

  1. Parallel Frontend Development: Mock data lets frontend work while backend APIs are in progress
  2. Test Coverage: Unit tests and integration tests need diverse edge cases
  3. Demo Environments: Can’t expose real user data in client demos
  4. Performance Testing: Generate thousands of records for load testing

Real data? Privacy issues. Static fixtures? Can’t cover edge cases. Random data hits the sweet spot.

Core Implementation: Data Generation Algorithms#

1. Name Generation#

Different strategies for Chinese vs English:

// English: First Name + Last Name
const EN_FIRST_NAMES = ['James', 'Mary', 'John', 'Patricia', /* ... */]
const EN_LAST_NAMES = ['Smith', 'Johnson', 'Williams', /* ... */]

function generateName(locale: Locale): string {
  if (locale === 'chinese') {
    return pick(ZH_LAST_NAMES) + pick(ZH_FIRST_NAMES)  // 张伟
  }
  return `${pick(EN_FIRST_NAMES)} ${pick(EN_LAST_NAMES)}`  // James Smith
}

Key points:

  • Chinese: surname first, given name second. English: opposite
  • TOP 20 surnames cover 90%+ of population
  • pick() is a utility function that randomly selects from an array

2. Email Generation#

Emails must follow format rules and ensure uniqueness:

const DOMAINS = ['gmail.com', 'yahoo.com', 'outlook.com', 'protonmail.com', /* ... */]

function generateEmail(locale: Locale): string {
  if (locale === 'chinese') {
    const name = pick(ZH_LAST_NAMES) + pick(ZH_FIRST_NAMES)
    return `${encodeURIComponent(name)}${randInt(1, 999)}@${pick(DOMAINS)}`
    // Result: 张伟123@gmail.com
  }
  return `${pick(EN_FIRST_NAMES).toLowerCase()}${pick(EN_LAST_NAMES).toLowerCase()}${randInt(1, 999)}@${pick(DOMAINS)}`
  // Result: jamesjohnson456@gmail.com
}

Two critical details:

  • Chinese names need encodeURIComponent to escape, otherwise email format is invalid
  • Random numeric suffix prevents duplicates

3. Phone Numbers#

Phone formats vary wildly by region:

function generatePhone(locale: Locale): string {
  if (locale === 'chinese') {
    // Chinese mobile: starts with 1, second digit 3-9, 11 digits total
    return `1${randInt(3, 9)}${String(Math.random()).substring(2, 11)}`
    // Result: 13812345678
  }
  // US phone: (area code) prefix-suffix
  return `(${randInt(200, 999)}) ${randInt(200, 999)}-${randInt(1000, 9999)}`
  // Result: (415) 555-1234
}

Chinese mobile number rules:

  • First digit is always 1
  • Second digit is 3-9 (carrier number ranges)
  • Remaining 9 digits are random

4. UUID Generation#

UUID v4 format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx

function generateUUID(): string {
  return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, (c) => {
    const r = (Math.random() * 16) | 0
    const v = c === 'x' ? r : (r & 0x3) | 0x8
    return v.toString(16)
  })
}

Key points:

  • 13th character is always 4, indicating UUID v4
  • 17th character (y) can only be 8/9/a/b to meet spec requirements

Multi-Format Output#

Generated data needs export options for immediate use:

JSON Format#

function formatJSON(records: Record<string, unknown>[]): string {
  return JSON.stringify(records, null, 2)
}

CSV Format#

CSV’s gotcha is escaping:

function formatCSV(records: Record<string, unknown>[]): string {
  if (records.length === 0) return ''
  const headers = Object.keys(records[0])
  const lines = [headers.join(',')]
  for (const record of records) {
    const values = headers.map(h => {
      const val = String(record[h] ?? '')
      // Escape if contains comma, quote, or newline
      return val.includes(',') || val.includes('"') || val.includes('\n')
        ? `"${val.replace(/"/g, '""')}"`  // Double the quotes
        : val
    })
    lines.push(values.join(','))
  }
  return lines.join('\n')
}

CSV escaping rules:

  • When a field contains comma, quote, or newline, wrap it in double quotes
  • Double quotes inside the field become ""

SQL INSERT#

function formatSQL(records: Record<string, unknown>[]): string {
  const tableName = 'random_data'
  const headers = Object.keys(records[0])
  const lines: string[] = []
  for (const record of records) {
    const values = headers.map(h => {
      const val = record[h]
      // Numbers don't need quotes; strings escape single quotes
      return typeof val === 'number' ? String(val) : `'${String(val).replace(/'/g, "''")}'`
    })
    lines.push(`INSERT INTO ${tableName} (${headers.join(', ')}) VALUES (${values.join(', ')});`)
  }
  return lines.join('\n')
}

SQL escaping rules:

  • Numbers don’t get quotes
  • Strings are wrapped in single quotes, internal single quotes become ''

TypeScript Type Inference#

Auto-generate type definitions:

function formatTypeScript(records: Record<string, unknown>[]): string {
  const headers = Object.keys(records[0])
  const typeMap: Record<string, string> = {}
  for (const h of headers) {
    const sample = records[0][h]
    typeMap[h] = typeof sample === 'number' ? 'number' : 'string'
  }
  const interfaceFields = headers.map(h => `  ${h}: ${typeMap[h]}`).join('\n')
  return `interface RandomData {
${interfaceFields}
}

const data: RandomData[] = ${JSON.stringify(records, null, 2)}`
}

Output:

interface RandomData {
  name: string
  email: string
  phone: string
  age: number
}

const data: RandomData[] = [
  { name: "James Smith", email: "jamessmith123@gmail.com", phone: "(415) 555-1234", age: 28 },
  // ...
]

Performance Considerations#

Generating 100 records is instant, but what about thousands?

Batch Generation Optimization#

function generateBatch(count: number, types: DataType[], locale: Locale) {
  const records: Record<string, unknown>[] = []
  for (let i = 0; i < count; i++) {
    const record: Record<string, unknown> = {}
    for (const type of types) {
      record[type] = generateValue(type, locale)
    }
    records.push(record)
  }
  return records
}

Time complexity O(n × m), where n is record count and m is field count. Generating 10,000 records takes about 50ms.

Avoiding Duplicates#

Some scenarios require uniqueness:

function generateUniqueUUIDs(count: number): string[] {
  const set = new Set<string>()
  while (set.size < count) {
    set.add(generateUUID())
  }
  return Array.from(set)
}

UUID collision probability is extremely low (2^122 possibilities), but emails and phones need Set-based deduplication.

Real-World Application#

Based on these principles, I built: Random Data Generator

Features:

  • 12 data types (name, email, phone, address, company, URL, IP, date, number, UUID, color, country)
  • Chinese and English locale support
  • 4 export formats (JSON, CSV, SQL, TypeScript)
  • Generate up to 100 records

Perfect for frontend development, QA testing, and product demos.

Summary#

Random data generation looks simple, but making it useful requires attention to detail:

  • Regional format differences (phone, address, name)
  • Multi-format escaping rules (CSV, SQL)
  • Uniqueness guarantees (UUID, email deduplication)
  • Performance optimization (batch generation)

Hope this helps. Next time you need test data, use a generator instead of manual entry.


Related: UUID Generator | JSON Formatter