JSON to SQL Batch Insert: From Escape Traps to Performance Optimization#

Recently worked on a data migration project, importing API JSON data into MySQL. Seemed simple at first—just map JSON fields to SQL statements, right? Ended up hitting several pitfalls.

The Biggest Trap: SQL Injection#

My initial implementation looked like this:

function jsonToSql(json) {
  const data = JSON.parse(json)
  const values = Object.values(data).map(v => `'${v}'`).join(', ')
  return `INSERT INTO users VALUES (${values})`
}

Looked fine until I encountered this data:

{
  "name": "O'Brien",
  "bio": "I'm a developer",
  "comment": "'); DROP TABLE users; --"
}

The generated SQL exploded. Classic SQL injection.

Proper Escaping#

String values need single-quote escaping. MySQL’s rule: replace ' with '':

function escapeValue(val: unknown, quote: "'" | '"'): string {
  if (val === null || val === undefined) return 'NULL'
  if (typeof val === 'number') return String(val)
  if (typeof val === 'boolean') return val ? '1' : '0'
  
  // String escaping
  const str = String(val)
  const escaped = str.replace(/\\/g, '\\\\').replace(new RegExp(quote, 'g'), quote + quote)
  return `${quote}${escaped}${quote}`
}

Note: backslashes also need escaping. Otherwise 'path\\to\\file' becomes 'path\to\file'—the backslash gets consumed.

Performance Optimization with Batch Inserts#

Inserting 1000 rows one by one:

INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');
-- ... repeat 1000 times

This requires 1000 network round trips—terrible performance.

The Right Way: Batch Insert#

MySQL supports multi-row inserts in a single statement:

INSERT INTO users (name, email) VALUES
  ('Alice', 'alice@example.com'),
  ('Bob', 'bob@example.com'),
  ('Charlie', 'charlie@example.com');

Significant performance improvement:

function batchInsert(rows, tableName, batchSize = 100) {
  const statements = []
  
  for (let i = 0; i < rows.length; i += batchSize) {
    const batch = rows.slice(i, i + batchSize)
    const valueRows = batch.map(row => {
      const vals = Object.values(row).map(escapeValue)
      return `(${vals.join(', ')})`
    })
    
    const sql = `INSERT INTO ${tableName} (${columns}) VALUES\n${valueRows.join(',\n')}`
    statements.push(sql)
  }
  
  return statements.join(';\n\n')
}

But batch size can’t be unlimited—MySQL’s max_allowed_packet defaults to 4MB. Exceed it and you get an error. In practice, batchSize of 50-100 works well.

Flattening Nested JSON#

API responses often have nested structures:

{
  "user": {
    "profile": {
      "name": "Alice",
      "age": 25
    },
    "contact": {
      "email": "alice@example.com"
    }
  }
}

Database tables are flat, so we need to flatten nested fields:

function flattenObject(obj: Record<string, unknown>, prefix = ''): Record<string, unknown> {
  const result = {}
  
  for (const [key, value] of Object.entries(obj)) {
    const flatKey = prefix ? `${prefix}.${key}` : key
    
    if (value !== null && typeof value === 'object' && !Array.isArray(value)) {
      // Recursively flatten nested objects
      Object.assign(result, flattenObject(value, flatKey))
    } else {
      result[flatKey] = value
    }
  }
  
  return result
}

After flattening:

{
  "user.profile.name": "Alice",
  "user.profile.age": 25,
  "user.contact.email": "alice@example.com"
}

Field names contain dots, so wrap them in backticks:

INSERT INTO users (`user.profile.name`, `user.profile.age`, `user.contact.email`) VALUES ...

Three Strategies for NULL Values#

JSON often has null or missing fields. How to handle them?

Strategy 1: Explicit NULL#

INSERT INTO users (name, email) VALUES ('Alice', NULL)

Database field allows NULL, insert NULL directly.

Strategy 2: Skip Field#

INSERT INTO users (name) VALUES ('Alice')

Don’t insert the email field, let the database use default value.

Strategy 3: DEFAULT Keyword#

INSERT INTO users (name, email) VALUES ('Alice', DEFAULT)

Explicitly use field default value.

Implementation can provide options:

type NullHandling = 'null' | 'skip' | 'default'

function handleNull(val: unknown, strategy: NullHandling): string {
  if (val !== null && val !== undefined) return escapeValue(val)
  
  switch (strategy) {
    case 'null': return 'NULL'
    case 'skip': return '' // Filtered out at outer layer
    case 'default': return 'DEFAULT'
  }
}

The Power of ON DUPLICATE KEY UPDATE#

Data imports often encounter duplicates. MySQL’s ON DUPLICATE KEY UPDATE handles this elegantly:

INSERT INTO users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')
ON DUPLICATE KEY UPDATE name = VALUES(name), email = VALUES(email)

If id=1 exists, update name and email; otherwise insert a new row.

Implementation:

function generateInsertWithUpsert(rows, tableName, columns) {
  const insert = generateBatchInsert(rows, tableName, columns)
  const updates = columns.map(c => `\`${c}\` = VALUES(\`${c}\`)`).join(', ')
  return `${insert}\nON DUPLICATE KEY UPDATE ${updates}`
}

Note: this syntax is MySQL-specific. PostgreSQL uses ON CONFLICT, SQL Server uses MERGE.

Real Case: From JSON API to MySQL#

Complete data import workflow:

async function importFromApi(apiUrl: string, tableName: string) {
  // 1. Fetch JSON data
  const response = await fetch(apiUrl)
  const data = await response.json()
  
  // 2. Flatten nested structures
  const flatRows = data.map(flattenObject)
  
  // 3. Extract all column names
  const columns = [...new Set(flatRows.flatMap(Object.keys))]
  
  // 4. Generate SQL in batches
  const statements = []
  for (let i = 0; i < flatRows.length; i += 50) {
    const batch = flatRows.slice(i, i + 50)
    const sql = generateInsert(batch, tableName, columns, {
      batchSize: 50,
      onDuplicateKey: true,
      nullHandling: 'null'
    })
    statements.push(sql)
  }
  
  // 5. Execute SQL
  for (const sql of statements) {
    await db.query(sql)
  }
}

Performance Comparison#

Test data: 1000 user records, 10 fields each.

Approach Execution Time Network Round Trips
Single insert 12.5s 1000
Batch 10 1.3s 100
Batch 50 0.28s 20
Batch 100 0.15s 10

Batch insert is 80x faster than single-row insert.

Online Tool#

Based on these principles, I built a JSON to SQL Converter:

Key features:

  • Auto-escape special characters, prevent SQL injection
  • Batch insert support with configurable batch size
  • Auto-flatten nested JSON
  • Three NULL handling strategies
  • INSERT IGNORE / ON DUPLICATE KEY UPDATE
  • INSERT / UPDATE modes

The implementation isn’t complex, but getting the details right requires considering many edge cases. Hope this helps.


Related: JSON Formatter | SQL Builder