JSON Diff Deep Comparison: From Recursive Traversal to Path Tracking#

Recently migrated an API and needed to compare JSON response structures between old and new versions. Manual line-by-line comparison was painful, so I built a tool. Here’s the implementation approach and pitfalls I encountered.

Why Simple === Isn’t Enough#

JavaScript’s === only checks primitive type equality:

{ a: 1 } === { a: 1 }  // false
[1, 2] === [1, 2]      // false

Objects and arrays are reference types. Even with identical content, their memory addresses differ. Deep comparison requires recursive traversal of every property.

Core Algorithm: Recursive Comparison#

Basic Approach#

  1. Different types → mark as changed
  2. Primitive types → compare values
  3. Objects/arrays → recursively compare each property/element
interface DiffItem {
  path: string           // Change path, e.g., "user.address.city"
  type: 'added' | 'removed' | 'changed'
  oldValue?: unknown
  newValue?: unknown
}

function compareObjects(obj1: unknown, obj2: unknown, path: string): DiffItem[] {
  const result: DiffItem[] = []

  // 1. Type mismatch or null handling
  if (typeof obj1 !== typeof obj2 || obj1 === null || obj2 === null) {
    if (obj1 === undefined) {
      result.push({ path, type: 'added', newValue: obj2 })
    } else if (obj2 === undefined) {
      result.push({ path, type: 'removed', oldValue: obj1 })
    } else {
      result.push({ path, type: 'changed', oldValue: obj1, newValue: obj2 })
    }
    return result
  }

  // 2. Primitive types: direct comparison
  if (typeof obj1 !== 'object') {
    if (obj1 !== obj2) {
      result.push({ path, type: 'changed', oldValue: obj1, newValue: obj2 })
    }
    return result
  }

  // 3. Array vs Object
  if (Array.isArray(obj1) !== Array.isArray(obj2)) {
    result.push({ path, type: 'changed', oldValue: obj1, newValue: obj2 })
    return result
  }

  // 4. Recursively compare all properties
  const keys1 = Object.keys(obj1 as object)
  const keys2 = Object.keys(obj2 as object)
  const allKeys = new Set([...keys1, ...keys2])

  allKeys.forEach(key => {
    const newPath = path ? `${path}.${key}` : key
    const hasKey1 = keys1.includes(key)
    const hasKey2 = keys2.includes(key)

    if (!hasKey1) {
      result.push({ path: newPath, type: 'added', newValue: (obj2 as any)[key] })
    } else if (!hasKey2) {
      result.push({ path: newPath, type: 'removed', oldValue: (obj1 as any)[key] })
    } else {
      result.push(...compareObjects(
        (obj1 as any)[key],
        (obj2 as any)[key],
        newPath
      ))
    }
  })

  return result
}

Path Tracking#

The key is passing the path parameter:

const newPath = path ? `${path}.${key}` : key
  • Root node: path = ''
  • First level: path = 'name'
  • Second level: path = 'user.name'
  • Array element: path = 'items.0'

This allows the final diff list to pinpoint each change location.

Special Handling for Arrays#

Array comparison has a question: does order change count as a diff?

// Scenario 1: Different order, same content
[1, 2, 3] vs [3, 2, 1]  // Is this a diff?

// Scenario 2: Element add/remove
[1, 2, 3] vs [1, 2, 4, 5]  // How to mark?

Approach 1: Index-based Comparison (Simple)#

// Treat arrays as objects, index is the key
[1, 2, 3]  { "0": 1, "1": 2, "2": 3 }
[3, 2, 1]  { "0": 3, "1": 2, "2": 1 }

// Result:
// [0]: changed (1 → 3)
// [2]: changed (3 → 1)

Pros: Simple implementation, good performance. Cons: Order-sensitive, may produce “false diffs”.

Approach 2: Content-based Matching (Smarter)#

function compareArrays(arr1: any[], arr2: any[], path: string): DiffItem[] {
  const result: DiffItem[] = []
  const matched = new Set<number>()

  // Find matches for each element in arr1
  arr1.forEach((item1, i) => {
    const matchIndex = arr2.findIndex((item2, j) => 
      !matched.has(j) && isEqual(item1, item2)
    )
    
    if (matchIndex === -1) {
      result.push({ path: `${path}[${i}]`, type: 'removed', oldValue: item1 })
    } else {
      matched.add(matchIndex)
    }
  })

  // Unmatched items in arr2 are additions
  arr2.forEach((item2, j) => {
    if (!matched.has(j)) {
      result.push({ path: `${path}[${j}]`, type: 'added', newValue: item2 })
    }
  })

  return result
}

Pros: Ignores order, focuses on content changes. Cons: O(n²) complexity, poor performance for large arrays.

Approach 3: LCS Longest Common Subsequence (Optimal)#

function lcsDiff(arr1: any[], arr2: any[], path: string): DiffItem[] {
  // 1. Compute LCS matrix
  const dp = Array(arr1.length + 1).fill(null)
    .map(() => Array(arr2.length + 1).fill(0))

  for (let i = 1; i <= arr1.length; i++) {
    for (let j = 1; j <= arr2.length; j++) {
      if (isEqual(arr1[i-1], arr2[j-1])) {
        dp[i][j] = dp[i-1][j-1] + 1
      } else {
        dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1])
      }
    }
  }

  // 2. Backtrack to find diffs
  const result: DiffItem[] = []
  let i = arr1.length, j = arr2.length

  while (i > 0 || j > 0) {
    if (i > 0 && j > 0 && isEqual(arr1[i-1], arr2[j-1])) {
      i--; j--
    } else if (j > 0 && (i === 0 || dp[i][j-1] >= dp[i-1][j])) {
      result.unshift({ 
        path: `${path}[${j-1}]`, 
        type: 'added', 
        newValue: arr2[j-1] 
      })
      j--
    } else {
      result.unshift({ 
        path: `${path}[${i-1}]`, 
        type: 'removed', 
        oldValue: arr1[i-1] 
      })
      i--
    }
  }

  return result
}

Pros: O(n*m) complexity, most accurate results. Cons: Complex implementation, requires extra space.

Performance Optimization#

1. Circular Reference Detection#

JSON objects can have circular references:

const obj = { a: 1 }
obj.self = obj  // Circular reference

Recursive comparison would loop infinitely, causing stack overflow:

function compareObjects(obj1: unknown, obj2: unknown, path: string, seen = new WeakSet()): DiffItem[] {
  // Detect circular reference
  if (obj1 && typeof obj1 === 'object') {
    if (seen.has(obj1)) return []
    seen.add(obj1)
  }
  
  // ... rest of logic
}

2. Large Object Optimization#

When JSON has deep nesting, recursion may overflow the stack. Use iteration instead:

function compareObjectsIterative(obj1: unknown, obj2: unknown): DiffItem[] {
  const result: DiffItem[] = []
  const stack = [{ obj1, obj2, path: '' }]

  while (stack.length > 0) {
    const { obj1, obj2, path } = stack.pop()!
    
    // ... comparison logic
    
    // Push children to stack
    if (typeof obj1 === 'object' && obj1 !== null) {
      Object.keys(obj1).forEach(key => {
        stack.push({
          obj1: (obj1 as any)[key],
          obj2: (obj2 as any)?.[key],
          path: path ? `${path}.${key}` : key
        })
      })
    }
  }

  return result
}

3. Cache Comparison Results#

If comparing the same objects multiple times, cache results:

const cache = new Map<string, DiffItem[]>()

function compareWithCache(obj1: unknown, obj2: unknown): DiffItem[] {
  const key = JSON.stringify([obj1, obj2])
  
  if (cache.has(key)) {
    return cache.get(key)!
  }
  
  const result = compareObjects(obj1, obj2, '')
  cache.set(key, result)
  return result
}

Note: JSON.stringify has performance overhead for large objects, so weigh the tradeoffs.

Real-world Pitfalls#

1. NaN and Infinity#

JSON.parse doesn’t support NaN and Infinity:

JSON.parse('{"value": NaN}')  // SyntaxError

But when comparing JavaScript objects directly:

const obj1 = { value: NaN }
const obj2 = { value: NaN }

obj1.value === obj2.value  // false! NaN !== NaN

Requires special handling:

if (Number.isNaN(obj1) && Number.isNaN(obj2)) {
  return []  // Treat as equal
}

2. Date Objects#

const date1 = new Date('2024-01-01')
const date2 = new Date('2024-01-01')

date1 === date2  // false
date1.getTime() === date2.getTime()  // true

Comparing Dates requires converting to timestamps:

if (obj1 instanceof Date && obj2 instanceof Date) {
  if (obj1.getTime() !== obj2.getTime()) {
    return [{ path, type: 'changed', oldValue: obj1, newValue: obj2 }]
  }
  return []
}

3. Special Object Types#

  • RegExp: compare source and flags
  • Map/Set: compare contents
  • Buffer: compare bytes
if (obj1 instanceof RegExp && obj2 instanceof RegExp) {
  if (obj1.source !== obj2.source || obj1.flags !== obj2.flags) {
    return [{ path, type: 'changed', oldValue: obj1, newValue: obj2 }]
  }
  return []
}

Final Implementation#

Based on these ideas, I built: JSON Diff

Features:

  • Deep JSON object comparison
  • Path tracking to locate diffs
  • Circular reference detection
  • Color-coded additions/deletions/changes

Core code is under 100 lines, but getting the details right requires considering many edge cases. Hope this helps.


Related: JSON Formatter | JSON Merge