JSON/XML/YAML Format Converter: From Parsers to Data Structure Mapping
JSON/XML/YAML Format Converter: From Parsers to Data Structure Mapping#
Recently I worked on a config file conversion requirement that needed bidirectional conversion between JSON, XML, and YAML formats. I thought it would be simple, but I hit quite a few pitfalls.
Essential Differences Between the Three Formats#
First, understand the data structure level differences:
JSON: Tree structure, supports six types: object, array, string, number, boolean, null.
XML: Nested tags, distinction between attributes and child elements, self-closing tags, namespaces. Core is element, each element has tagName, attributes, children.
YAML: Indentation-based hierarchy, supports references, anchors, multi-line strings, focuses on human readability.
The core conversion challenge: data models don’t have one-to-one correspondence.
JSON to XML: Type System Mapping#
JSON has no concept of attributes, all key-value pairs become child nodes:
function jsonToXmlString(obj: unknown, rootName: string, indent: string): string {
if (obj === null || obj === undefined) {
return `${indent}<${rootName}/>\n` // Empty node
}
if (typeof obj !== 'object') {
// Primitive types: direct text content
return `${indent}<${rootName}>${escapeXml(String(obj))}</${rootName}>\n`
}
if (Array.isArray(obj)) {
// Array: each element generates a same-name tag
let result = ''
obj.forEach(item => {
result += jsonToXmlString(item, rootName, indent)
})
return result
}
// Object: key names become tag names
let result = `${indent}<${rootName}>\n`
Object.entries(obj).forEach(([key, value]) => {
if (Array.isArray(value)) {
value.forEach(item => {
result += jsonToXmlString(item, key, indent + ' ')
})
} else {
result += jsonToXmlString(value, key, indent + ' ')
}
})
result += `${indent}</${rootName}>\n`
return result
}
Key Points:
- Array Handling:
{ "items": [1, 2, 3] }generates three<items>1</items>tags, not one parent tag - Null Handling:
nullbecomes self-closing tag<tag/> - XML Escaping:
<>&"'must be escaped, otherwise XML structure breaks
function escapeXml(str: string): string {
return str
.replace(/&/g, '&') // Must be processed first
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''')
}
XML to JSON: Information Loss and Reconstruction#
XML to JSON is more complex because XML has stronger expressiveness:
function parseXmlSimple(xml: string): unknown {
xml = xml.replace(/<\?xml[^>]*\?>/g, '').trim() // Remove XML declaration
const parseNode = (str: string): unknown => {
const tagMatch = str.match(/<(\w+)[^>]*>([\s\S]*)<\/\1>/)
if (!tagMatch) {
// Self-closing tag → null
const selfClosingMatch = str.match(/<(\w+)[^>]*\/>/)
if (selfClosingMatch) {
return { [selfClosingMatch[1]]: null }
}
return str.trim() // Plain text
}
const [, tagName, content] = tagMatch
const innerContent = content.trim()
// Check if has child tags
if (!innerContent.match(/<(\w+)[^>]*>/)) {
return { [tagName]: innerContent } // Plain text content
}
// Recursively parse child tags
const result: Record<string, unknown> = {}
const childRegex = /<(\w+)[^>]*>([\s\S]*?)<\/\1>|<(\w+)[^>]*\/>/g
let match
while ((match = childRegex.exec(innerContent)) !== null) {
const childTag = match[1] || match[3]
const childContent = match[2] || ''
if (result[childTag]) {
// Same-name tags convert to array
if (!Array.isArray(result[childTag])) {
result[childTag] = [result[childTag]]
}
(result[childTag] as unknown[]).push(parseNode(match[0]))
} else {
result[childTag] = parseNode(match[0])
}
}
return { [tagName]: result }
}
return parseNode(xml)
}
Information Loss Points:
- Attribute Loss:
id="123"in<item id="123">isn’t handled, needs convention like{ "_id": "123" }or{ "$": { "id": "123" } } - Mixed Content:
<p>Hello <b>world</b></p>can’t perfectly map to JSON - Namespaces: Should
<ns:item>become"ns:item"or just"item"in JSON?
For production, recommend using xml-js library for complete attribute handling and configuration options.
JSON and YAML Bidirectional Conversion: js-yaml Library Practice#
YAML has stronger expressiveness than JSON (anchors, references, multi-line strings), conversion is relatively simple:
import yaml from 'js-yaml'
// JSON → YAML
export function jsonToYaml(input: string): ConvertResult {
try {
const parsed = JSON.parse(input)
const yamlStr = yaml.dump(parsed, {
indent: 2,
lineWidth: -1, // No line wrapping
noRefs: true, // Disable references, avoid circular reference errors
})
return { data: yamlStr }
} catch (e) {
return { error: `Conversion failed: ${(e as Error).message}` }
}
}
// YAML → JSON
export function yamlToJson(input: string): ConvertResult {
try {
const parsed = yaml.load(input)
const formatted = JSON.stringify(parsed, null, 2)
return { data: formatted }
} catch (e) {
return { error: `Conversion failed: ${(e as Error).message}` }
}
}
Key Parameters:
lineWidth: -1: YAML defaults to 80-character line wrapping, recommend disabling for config filesnoRefs: true: YAML’s&anchorand*aliasreference syntax causes circular references, need explicit disable or handling
Pitfalls I Actually Encountered#
1. Regex Backtracking#
First version XML parser used regex:
const regex = /<(\w+)>(.*?)<\/\1>/g
Encountered 1MB XML file, regex backtracking caused 100% CPU, page froze.
Solution: Use state machine or professional library (like sax-js streaming parser).
2. BOM Header Causing Parse Failure#
User-uploaded XML file had UTF-8 BOM (\uFEFF), JSON.parse() threw error.
const cleaned = input.replace(/^\uFEFF/, '') // Remove BOM
3. Circular Reference Handling#
When converting JSON to YAML, circular references cause errors:
const obj = { a: 1 }
obj.self = obj
yaml.dump(obj) // TypeError: Converting circular structure to JSON
Solution: noRefs: true or detect circular references.
4. Special Character Encoding#
Entity references like A in XML, Unicode escapes like \u0041 in YAML need proper decoding.
Performance Optimization Strategies#
1. Format Detection#
Auto-detect format based on input:
function detectFormat(input: string): 'json' | 'xml' | 'yaml' {
const trimmed = input.trim()
if (trimmed.startsWith('{') || trimmed.startsWith('[')) return 'json'
if (trimmed.startsWith('<')) return 'xml'
return 'yaml' // Default to YAML (indentation format)
}
2. Web Worker Async Conversion#
Put large file conversion in Web Worker to avoid blocking UI:
// worker.ts
self.onmessage = (e) => {
const { type, data } = e.data
if (type === 'json-to-xml') {
const result = jsonToXml(data)
self.postMessage(result)
}
}
// main.tsx
const worker = new Worker('worker.ts')
worker.postMessage({ type: 'json-to-xml', data: largeJson })
worker.onmessage = (e) => setOutput(e.data.data)
3. Cache Parse Results#
Avoid repeated parsing:
const cache = useMemo(() => {
try {
return JSON.parse(input)
} catch {
return null
}
}, [input])
Final Result#
Based on the above approach, implemented an online format converter: JSON/XML/YAML Converter
Main features:
- JSON ↔ XML bidirectional conversion
- JSON ↔ YAML bidirectional conversion
- Auto format detection
- One-click swap input/output
- Supports files up to 10MB
Technical details aren’t complex, but handling edge cases completely requires careful thought. Hope this helps.
Related tools: JSON Formatter | XML Formatter | JSON Diff