JSON Diff Deep Comparison: From Recursive Traversal to Path Tracking
JSON Diff Deep Comparison: From Recursive Traversal to Path Tracking#
Recently migrated an API and needed to compare JSON response structures between old and new versions. Manual line-by-line comparison was painful, so I built a tool. Here’s the implementation approach and pitfalls I encountered.
Why Simple === Isn’t Enough#
JavaScript’s === only checks primitive type equality:
{ a: 1 } === { a: 1 } // false
[1, 2] === [1, 2] // false
Objects and arrays are reference types. Even with identical content, their memory addresses differ. Deep comparison requires recursive traversal of every property.
Core Algorithm: Recursive Comparison#
Basic Approach#
- Different types → mark as changed
- Primitive types → compare values
- Objects/arrays → recursively compare each property/element
interface DiffItem {
path: string // Change path, e.g., "user.address.city"
type: 'added' | 'removed' | 'changed'
oldValue?: unknown
newValue?: unknown
}
function compareObjects(obj1: unknown, obj2: unknown, path: string): DiffItem[] {
const result: DiffItem[] = []
// 1. Type mismatch or null handling
if (typeof obj1 !== typeof obj2 || obj1 === null || obj2 === null) {
if (obj1 === undefined) {
result.push({ path, type: 'added', newValue: obj2 })
} else if (obj2 === undefined) {
result.push({ path, type: 'removed', oldValue: obj1 })
} else {
result.push({ path, type: 'changed', oldValue: obj1, newValue: obj2 })
}
return result
}
// 2. Primitive types: direct comparison
if (typeof obj1 !== 'object') {
if (obj1 !== obj2) {
result.push({ path, type: 'changed', oldValue: obj1, newValue: obj2 })
}
return result
}
// 3. Array vs Object
if (Array.isArray(obj1) !== Array.isArray(obj2)) {
result.push({ path, type: 'changed', oldValue: obj1, newValue: obj2 })
return result
}
// 4. Recursively compare all properties
const keys1 = Object.keys(obj1 as object)
const keys2 = Object.keys(obj2 as object)
const allKeys = new Set([...keys1, ...keys2])
allKeys.forEach(key => {
const newPath = path ? `${path}.${key}` : key
const hasKey1 = keys1.includes(key)
const hasKey2 = keys2.includes(key)
if (!hasKey1) {
result.push({ path: newPath, type: 'added', newValue: (obj2 as any)[key] })
} else if (!hasKey2) {
result.push({ path: newPath, type: 'removed', oldValue: (obj1 as any)[key] })
} else {
result.push(...compareObjects(
(obj1 as any)[key],
(obj2 as any)[key],
newPath
))
}
})
return result
}
Path Tracking#
The key is passing the path parameter:
const newPath = path ? `${path}.${key}` : key
- Root node:
path = '' - First level:
path = 'name' - Second level:
path = 'user.name' - Array element:
path = 'items.0'
This allows the final diff list to pinpoint each change location.
Special Handling for Arrays#
Array comparison has a question: does order change count as a diff?
// Scenario 1: Different order, same content
[1, 2, 3] vs [3, 2, 1] // Is this a diff?
// Scenario 2: Element add/remove
[1, 2, 3] vs [1, 2, 4, 5] // How to mark?
Approach 1: Index-based Comparison (Simple)#
// Treat arrays as objects, index is the key
[1, 2, 3] → { "0": 1, "1": 2, "2": 3 }
[3, 2, 1] → { "0": 3, "1": 2, "2": 1 }
// Result:
// [0]: changed (1 → 3)
// [2]: changed (3 → 1)
Pros: Simple implementation, good performance. Cons: Order-sensitive, may produce “false diffs”.
Approach 2: Content-based Matching (Smarter)#
function compareArrays(arr1: any[], arr2: any[], path: string): DiffItem[] {
const result: DiffItem[] = []
const matched = new Set<number>()
// Find matches for each element in arr1
arr1.forEach((item1, i) => {
const matchIndex = arr2.findIndex((item2, j) =>
!matched.has(j) && isEqual(item1, item2)
)
if (matchIndex === -1) {
result.push({ path: `${path}[${i}]`, type: 'removed', oldValue: item1 })
} else {
matched.add(matchIndex)
}
})
// Unmatched items in arr2 are additions
arr2.forEach((item2, j) => {
if (!matched.has(j)) {
result.push({ path: `${path}[${j}]`, type: 'added', newValue: item2 })
}
})
return result
}
Pros: Ignores order, focuses on content changes.
Cons: O(n²) complexity, poor performance for large arrays.
Approach 3: LCS Longest Common Subsequence (Optimal)#
function lcsDiff(arr1: any[], arr2: any[], path: string): DiffItem[] {
// 1. Compute LCS matrix
const dp = Array(arr1.length + 1).fill(null)
.map(() => Array(arr2.length + 1).fill(0))
for (let i = 1; i <= arr1.length; i++) {
for (let j = 1; j <= arr2.length; j++) {
if (isEqual(arr1[i-1], arr2[j-1])) {
dp[i][j] = dp[i-1][j-1] + 1
} else {
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1])
}
}
}
// 2. Backtrack to find diffs
const result: DiffItem[] = []
let i = arr1.length, j = arr2.length
while (i > 0 || j > 0) {
if (i > 0 && j > 0 && isEqual(arr1[i-1], arr2[j-1])) {
i--; j--
} else if (j > 0 && (i === 0 || dp[i][j-1] >= dp[i-1][j])) {
result.unshift({
path: `${path}[${j-1}]`,
type: 'added',
newValue: arr2[j-1]
})
j--
} else {
result.unshift({
path: `${path}[${i-1}]`,
type: 'removed',
oldValue: arr1[i-1]
})
i--
}
}
return result
}
Pros: O(n*m) complexity, most accurate results.
Cons: Complex implementation, requires extra space.
Performance Optimization#
1. Circular Reference Detection#
JSON objects can have circular references:
const obj = { a: 1 }
obj.self = obj // Circular reference
Recursive comparison would loop infinitely, causing stack overflow:
function compareObjects(obj1: unknown, obj2: unknown, path: string, seen = new WeakSet()): DiffItem[] {
// Detect circular reference
if (obj1 && typeof obj1 === 'object') {
if (seen.has(obj1)) return []
seen.add(obj1)
}
// ... rest of logic
}
2. Large Object Optimization#
When JSON has deep nesting, recursion may overflow the stack. Use iteration instead:
function compareObjectsIterative(obj1: unknown, obj2: unknown): DiffItem[] {
const result: DiffItem[] = []
const stack = [{ obj1, obj2, path: '' }]
while (stack.length > 0) {
const { obj1, obj2, path } = stack.pop()!
// ... comparison logic
// Push children to stack
if (typeof obj1 === 'object' && obj1 !== null) {
Object.keys(obj1).forEach(key => {
stack.push({
obj1: (obj1 as any)[key],
obj2: (obj2 as any)?.[key],
path: path ? `${path}.${key}` : key
})
})
}
}
return result
}
3. Cache Comparison Results#
If comparing the same objects multiple times, cache results:
const cache = new Map<string, DiffItem[]>()
function compareWithCache(obj1: unknown, obj2: unknown): DiffItem[] {
const key = JSON.stringify([obj1, obj2])
if (cache.has(key)) {
return cache.get(key)!
}
const result = compareObjects(obj1, obj2, '')
cache.set(key, result)
return result
}
Note: JSON.stringify has performance overhead for large objects, so weigh the tradeoffs.
Real-world Pitfalls#
1. NaN and Infinity#
JSON.parse doesn’t support NaN and Infinity:
JSON.parse('{"value": NaN}') // SyntaxError
But when comparing JavaScript objects directly:
const obj1 = { value: NaN }
const obj2 = { value: NaN }
obj1.value === obj2.value // false! NaN !== NaN
Requires special handling:
if (Number.isNaN(obj1) && Number.isNaN(obj2)) {
return [] // Treat as equal
}
2. Date Objects#
const date1 = new Date('2024-01-01')
const date2 = new Date('2024-01-01')
date1 === date2 // false
date1.getTime() === date2.getTime() // true
Comparing Dates requires converting to timestamps:
if (obj1 instanceof Date && obj2 instanceof Date) {
if (obj1.getTime() !== obj2.getTime()) {
return [{ path, type: 'changed', oldValue: obj1, newValue: obj2 }]
}
return []
}
3. Special Object Types#
RegExp: comparesourceandflagsMap/Set: compare contentsBuffer: compare bytes
if (obj1 instanceof RegExp && obj2 instanceof RegExp) {
if (obj1.source !== obj2.source || obj1.flags !== obj2.flags) {
return [{ path, type: 'changed', oldValue: obj1, newValue: obj2 }]
}
return []
}
Final Implementation#
Based on these ideas, I built: JSON Diff
Features:
- Deep JSON object comparison
- Path tracking to locate diffs
- Circular reference detection
- Color-coded additions/deletions/changes
Core code is under 100 lines, but getting the details right requires considering many edge cases. Hope this helps.
Related: JSON Formatter | JSON Merge