JSON to SQL Batch Insert: From Escape Traps to Performance Optimization
JSON to SQL Batch Insert: From Escape Traps to Performance Optimization#
Recently worked on a data migration project, importing API JSON data into MySQL. Seemed simple at first—just map JSON fields to SQL statements, right? Ended up hitting several pitfalls.
The Biggest Trap: SQL Injection#
My initial implementation looked like this:
function jsonToSql(json) {
const data = JSON.parse(json)
const values = Object.values(data).map(v => `'${v}'`).join(', ')
return `INSERT INTO users VALUES (${values})`
}
Looked fine until I encountered this data:
{
"name": "O'Brien",
"bio": "I'm a developer",
"comment": "'); DROP TABLE users; --"
}
The generated SQL exploded. Classic SQL injection.
Proper Escaping#
String values need single-quote escaping. MySQL’s rule: replace ' with '':
function escapeValue(val: unknown, quote: "'" | '"'): string {
if (val === null || val === undefined) return 'NULL'
if (typeof val === 'number') return String(val)
if (typeof val === 'boolean') return val ? '1' : '0'
// String escaping
const str = String(val)
const escaped = str.replace(/\\/g, '\\\\').replace(new RegExp(quote, 'g'), quote + quote)
return `${quote}${escaped}${quote}`
}
Note: backslashes also need escaping. Otherwise 'path\\to\\file' becomes 'path\to\file'—the backslash gets consumed.
Performance Optimization with Batch Inserts#
Inserting 1000 rows one by one:
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');
-- ... repeat 1000 times
This requires 1000 network round trips—terrible performance.
The Right Way: Batch Insert#
MySQL supports multi-row inserts in a single statement:
INSERT INTO users (name, email) VALUES
('Alice', 'alice@example.com'),
('Bob', 'bob@example.com'),
('Charlie', 'charlie@example.com');
Significant performance improvement:
function batchInsert(rows, tableName, batchSize = 100) {
const statements = []
for (let i = 0; i < rows.length; i += batchSize) {
const batch = rows.slice(i, i + batchSize)
const valueRows = batch.map(row => {
const vals = Object.values(row).map(escapeValue)
return `(${vals.join(', ')})`
})
const sql = `INSERT INTO ${tableName} (${columns}) VALUES\n${valueRows.join(',\n')}`
statements.push(sql)
}
return statements.join(';\n\n')
}
But batch size can’t be unlimited—MySQL’s max_allowed_packet defaults to 4MB. Exceed it and you get an error. In practice, batchSize of 50-100 works well.
Flattening Nested JSON#
API responses often have nested structures:
{
"user": {
"profile": {
"name": "Alice",
"age": 25
},
"contact": {
"email": "alice@example.com"
}
}
}
Database tables are flat, so we need to flatten nested fields:
function flattenObject(obj: Record<string, unknown>, prefix = ''): Record<string, unknown> {
const result = {}
for (const [key, value] of Object.entries(obj)) {
const flatKey = prefix ? `${prefix}.${key}` : key
if (value !== null && typeof value === 'object' && !Array.isArray(value)) {
// Recursively flatten nested objects
Object.assign(result, flattenObject(value, flatKey))
} else {
result[flatKey] = value
}
}
return result
}
After flattening:
{
"user.profile.name": "Alice",
"user.profile.age": 25,
"user.contact.email": "alice@example.com"
}
Field names contain dots, so wrap them in backticks:
INSERT INTO users (`user.profile.name`, `user.profile.age`, `user.contact.email`) VALUES ...
Three Strategies for NULL Values#
JSON often has null or missing fields. How to handle them?
Strategy 1: Explicit NULL#
INSERT INTO users (name, email) VALUES ('Alice', NULL)
Database field allows NULL, insert NULL directly.
Strategy 2: Skip Field#
INSERT INTO users (name) VALUES ('Alice')
Don’t insert the email field, let the database use default value.
Strategy 3: DEFAULT Keyword#
INSERT INTO users (name, email) VALUES ('Alice', DEFAULT)
Explicitly use field default value.
Implementation can provide options:
type NullHandling = 'null' | 'skip' | 'default'
function handleNull(val: unknown, strategy: NullHandling): string {
if (val !== null && val !== undefined) return escapeValue(val)
switch (strategy) {
case 'null': return 'NULL'
case 'skip': return '' // Filtered out at outer layer
case 'default': return 'DEFAULT'
}
}
The Power of ON DUPLICATE KEY UPDATE#
Data imports often encounter duplicates. MySQL’s ON DUPLICATE KEY UPDATE handles this elegantly:
INSERT INTO users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')
ON DUPLICATE KEY UPDATE name = VALUES(name), email = VALUES(email)
If id=1 exists, update name and email; otherwise insert a new row.
Implementation:
function generateInsertWithUpsert(rows, tableName, columns) {
const insert = generateBatchInsert(rows, tableName, columns)
const updates = columns.map(c => `\`${c}\` = VALUES(\`${c}\`)`).join(', ')
return `${insert}\nON DUPLICATE KEY UPDATE ${updates}`
}
Note: this syntax is MySQL-specific. PostgreSQL uses ON CONFLICT, SQL Server uses MERGE.
Real Case: From JSON API to MySQL#
Complete data import workflow:
async function importFromApi(apiUrl: string, tableName: string) {
// 1. Fetch JSON data
const response = await fetch(apiUrl)
const data = await response.json()
// 2. Flatten nested structures
const flatRows = data.map(flattenObject)
// 3. Extract all column names
const columns = [...new Set(flatRows.flatMap(Object.keys))]
// 4. Generate SQL in batches
const statements = []
for (let i = 0; i < flatRows.length; i += 50) {
const batch = flatRows.slice(i, i + 50)
const sql = generateInsert(batch, tableName, columns, {
batchSize: 50,
onDuplicateKey: true,
nullHandling: 'null'
})
statements.push(sql)
}
// 5. Execute SQL
for (const sql of statements) {
await db.query(sql)
}
}
Performance Comparison#
Test data: 1000 user records, 10 fields each.
| Approach | Execution Time | Network Round Trips |
|---|---|---|
| Single insert | 12.5s | 1000 |
| Batch 10 | 1.3s | 100 |
| Batch 50 | 0.28s | 20 |
| Batch 100 | 0.15s | 10 |
Batch insert is 80x faster than single-row insert.
Online Tool#
Based on these principles, I built a JSON to SQL Converter:
Key features:
- Auto-escape special characters, prevent SQL injection
- Batch insert support with configurable batch size
- Auto-flatten nested JSON
- Three NULL handling strategies
- INSERT IGNORE / ON DUPLICATE KEY UPDATE
- INSERT / UPDATE modes
The implementation isn’t complex, but getting the details right requires considering many edge cases. Hope this helps.
Related: JSON Formatter | SQL Builder