JSON/XML/YAML Format Converter: From Parsers to Data Structure Mapping#

Recently I worked on a config file conversion requirement that needed bidirectional conversion between JSON, XML, and YAML formats. I thought it would be simple, but I hit quite a few pitfalls.

Essential Differences Between the Three Formats#

First, understand the data structure level differences:

JSON: Tree structure, supports six types: object, array, string, number, boolean, null.

XML: Nested tags, distinction between attributes and child elements, self-closing tags, namespaces. Core is element, each element has tagName, attributes, children.

YAML: Indentation-based hierarchy, supports references, anchors, multi-line strings, focuses on human readability.

The core conversion challenge: data models don’t have one-to-one correspondence.

JSON to XML: Type System Mapping#

JSON has no concept of attributes, all key-value pairs become child nodes:

function jsonToXmlString(obj: unknown, rootName: string, indent: string): string {
  if (obj === null || obj === undefined) {
    return `${indent}<${rootName}/>\n`  // Empty node
  }

  if (typeof obj !== 'object') {
    // Primitive types: direct text content
    return `${indent}<${rootName}>${escapeXml(String(obj))}</${rootName}>\n`
  }

  if (Array.isArray(obj)) {
    // Array: each element generates a same-name tag
    let result = ''
    obj.forEach(item => {
      result += jsonToXmlString(item, rootName, indent)
    })
    return result
  }

  // Object: key names become tag names
  let result = `${indent}<${rootName}>\n`
  Object.entries(obj).forEach(([key, value]) => {
    if (Array.isArray(value)) {
      value.forEach(item => {
        result += jsonToXmlString(item, key, indent + '  ')
      })
    } else {
      result += jsonToXmlString(value, key, indent + '  ')
    }
  })
  result += `${indent}</${rootName}>\n`
  return result
}

Key Points:

  1. Array Handling: { "items": [1, 2, 3] } generates three <items>1</items> tags, not one parent tag
  2. Null Handling: null becomes self-closing tag <tag/>
  3. XML Escaping: < > & " ' must be escaped, otherwise XML structure breaks
function escapeXml(str: string): string {
  return str
    .replace(/&/g, '&amp;')   // Must be processed first
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;')
}

XML to JSON: Information Loss and Reconstruction#

XML to JSON is more complex because XML has stronger expressiveness:

function parseXmlSimple(xml: string): unknown {
  xml = xml.replace(/<\?xml[^>]*\?>/g, '').trim()  // Remove XML declaration

  const parseNode = (str: string): unknown => {
    const tagMatch = str.match(/<(\w+)[^>]*>([\s\S]*)<\/\1>/)

    if (!tagMatch) {
      // Self-closing tag → null
      const selfClosingMatch = str.match(/<(\w+)[^>]*\/>/)
      if (selfClosingMatch) {
        return { [selfClosingMatch[1]]: null }
      }
      return str.trim()  // Plain text
    }

    const [, tagName, content] = tagMatch
    const innerContent = content.trim()

    // Check if has child tags
    if (!innerContent.match(/<(\w+)[^>]*>/)) {
      return { [tagName]: innerContent }  // Plain text content
    }

    // Recursively parse child tags
    const result: Record<string, unknown> = {}
    const childRegex = /<(\w+)[^>]*>([\s\S]*?)<\/\1>|<(\w+)[^>]*\/>/g
    let match

    while ((match = childRegex.exec(innerContent)) !== null) {
      const childTag = match[1] || match[3]
      const childContent = match[2] || ''

      if (result[childTag]) {
        // Same-name tags convert to array
        if (!Array.isArray(result[childTag])) {
          result[childTag] = [result[childTag]]
        }
        (result[childTag] as unknown[]).push(parseNode(match[0]))
      } else {
        result[childTag] = parseNode(match[0])
      }
    }

    return { [tagName]: result }
  }

  return parseNode(xml)
}

Information Loss Points:

  1. Attribute Loss: id="123" in <item id="123"> isn’t handled, needs convention like { "_id": "123" } or { "$": { "id": "123" } }
  2. Mixed Content: <p>Hello <b>world</b></p> can’t perfectly map to JSON
  3. Namespaces: Should <ns:item> become "ns:item" or just "item" in JSON?

For production, recommend using xml-js library for complete attribute handling and configuration options.

JSON and YAML Bidirectional Conversion: js-yaml Library Practice#

YAML has stronger expressiveness than JSON (anchors, references, multi-line strings), conversion is relatively simple:

import yaml from 'js-yaml'

// JSON → YAML
export function jsonToYaml(input: string): ConvertResult {
  try {
    const parsed = JSON.parse(input)
    const yamlStr = yaml.dump(parsed, {
      indent: 2,
      lineWidth: -1,    // No line wrapping
      noRefs: true,     // Disable references, avoid circular reference errors
    })
    return { data: yamlStr }
  } catch (e) {
    return { error: `Conversion failed: ${(e as Error).message}` }
  }
}

// YAML → JSON
export function yamlToJson(input: string): ConvertResult {
  try {
    const parsed = yaml.load(input)
    const formatted = JSON.stringify(parsed, null, 2)
    return { data: formatted }
  } catch (e) {
    return { error: `Conversion failed: ${(e as Error).message}` }
  }
}

Key Parameters:

  • lineWidth: -1: YAML defaults to 80-character line wrapping, recommend disabling for config files
  • noRefs: true: YAML’s &anchor and *alias reference syntax causes circular references, need explicit disable or handling

Pitfalls I Actually Encountered#

1. Regex Backtracking#

First version XML parser used regex:

const regex = /<(\w+)>(.*?)<\/\1>/g

Encountered 1MB XML file, regex backtracking caused 100% CPU, page froze.

Solution: Use state machine or professional library (like sax-js streaming parser).

2. BOM Header Causing Parse Failure#

User-uploaded XML file had UTF-8 BOM (\uFEFF), JSON.parse() threw error.

const cleaned = input.replace(/^\uFEFF/, '')  // Remove BOM

3. Circular Reference Handling#

When converting JSON to YAML, circular references cause errors:

const obj = { a: 1 }
obj.self = obj
yaml.dump(obj)  // TypeError: Converting circular structure to JSON

Solution: noRefs: true or detect circular references.

4. Special Character Encoding#

Entity references like &#65; in XML, Unicode escapes like \u0041 in YAML need proper decoding.

Performance Optimization Strategies#

1. Format Detection#

Auto-detect format based on input:

function detectFormat(input: string): 'json' | 'xml' | 'yaml' {
  const trimmed = input.trim()
  if (trimmed.startsWith('{') || trimmed.startsWith('[')) return 'json'
  if (trimmed.startsWith('<')) return 'xml'
  return 'yaml'  // Default to YAML (indentation format)
}

2. Web Worker Async Conversion#

Put large file conversion in Web Worker to avoid blocking UI:

// worker.ts
self.onmessage = (e) => {
  const { type, data } = e.data
  if (type === 'json-to-xml') {
    const result = jsonToXml(data)
    self.postMessage(result)
  }
}

// main.tsx
const worker = new Worker('worker.ts')
worker.postMessage({ type: 'json-to-xml', data: largeJson })
worker.onmessage = (e) => setOutput(e.data.data)

3. Cache Parse Results#

Avoid repeated parsing:

const cache = useMemo(() => {
  try {
    return JSON.parse(input)
  } catch {
    return null
  }
}, [input])

Final Result#

Based on the above approach, implemented an online format converter: JSON/XML/YAML Converter

Main features:

  • JSON ↔ XML bidirectional conversion
  • JSON ↔ YAML bidirectional conversion
  • Auto format detection
  • One-click swap input/output
  • Supports files up to 10MB

Technical details aren’t complex, but handling edge cases completely requires careful thought. Hope this helps.


Related tools: JSON Formatter | XML Formatter | JSON Diff