From LCS to Myers: The Algorithms Behind Text Diff Checkers
From LCS to Myers: The Algorithms Behind Text Diff Checkers#
Published: 2026-05-02 01:53
If you use git daily, you’ve seen red and green diff lines more times than you can count. They look simple enough — deleted lines turn red, added lines turn green. But when I built an online diff checker from scratch, I quickly discovered how much algorithmic depth sits behind this deceptively simple UI.
The Obvious First Attempt: Two Pointers#
The naive approach is to split both texts into lines, walk two pointers forward, and flag differences as they appear:
while (oi < origLines.length || mi < modLines.length) {
if (origLines[oi] === modLines[mi]) {
// Lines match — mark as equal, advance both pointers
result.push({ type: 'equal', line: origLines[oi] })
oi++; mi++
} else {
// Mismatch — figure out what happened
}
}
Straightforward, but there’s an immediate problem: when two lines don’t match, how do you know if a line was added or removed? There’s no way to tell with just local context.
Greedy Lookahead Strategy#
The common fix is to look ahead in both directions when a mismatch occurs:
// Search forward in modified text for the current original line
let foundInMod = -1
for (let k = mi + 1; k < Math.min(mi + 10, modLines.length); k++) {
if (modLines[k] === origLines[oi]) { foundInMod = k; break }
}
// Search forward in original text for the current modified line
let foundInOrig = -1
for (let k = oi + 1; k < Math.min(oi + 10, origLines.length); k++) {
if (origLines[k] === modLines[mi]) { foundInOrig = k; break }
}
The lookahead depth limit of 10 is a deliberate design choice. Without it, the algorithm degenerates to O(N×M) complexity. With it, you accept that extremely large deletion blocks (>10 lines) might not align perfectly — but for most real-world diffs, this trade-off works well.
The LCS Approach: Optimal but Expensive#
The greedy strategy can produce suboptimal alignments. Consider:
Original: A B C D E
Modified: A C B D E
A greedy approach might mark B as deleted and C as added, when the real story is just a reordering.
This brings us to the Longest Common Subsequence (LCS) problem — a classic DP staple:
function lcsLength(a: string[], b: string[]) {
const m = a.length, n = b.length
const dp = Array.from({ length: m + 1 }, () => new Array(n + 1).fill(0))
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
dp[i][j] = a[i - 1] === b[j - 1]
? dp[i - 1][j - 1] + 1
: Math.max(dp[i - 1][j], dp[i][j - 1])
}
}
return dp
}
LCS guarantees the optimal alignment. The cost? O(M×N) time and space. For a 1000-line document, that’s 1 million cells — manageable. For a 10,000-line file, it’s 100 million cells, and your browser will start sweating.
Myers Diff: What Git Uses#
This is where Myers’ algorithm shines. Published by Eugene Myers in 1986, it frames the diff problem as a shortest path search on an edit graph. Each cell in the edit graph represents a state (position in A, position in B), and edges represent matching lines, deletions, or insertions.
The key insight is that Myers uses a greedy search with backtracking that runs in O(M+N+D²), where D is the number of differences. For typical code changes where D is small (a few lines), this is essentially linear.
When would LCS outperform Myers? The edge cases involve many identical lines — think of a 1000-line file where every line is blank, and you change a few. Myers can produce surprising alignments in these scenarios because it prioritizes minimizing the edit count over producing visually intuitive output.
Practical Trade-offs for a Browser Tool#
After studying these algorithms, here’s what I settled on for a browser-based tool:
- Small texts (<100 lines): Any algorithm works. LCS gives the nicest output.
- Medium texts (100-1000 lines): Greedy lookahead with a depth limit is the sweet spot — good enough alignment without the O(N²) cost.
- Large texts (>1000 lines): You need chunking or a Myers variant. A single-threaded LCS will freeze the UI.
The Diff Checker on JsonKit uses the greedy lookahead approach with a 10-line search limit. It also provides line numbers, change statistics (added/removed/unchanged counts), and unified diff export. Fast enough for daily use, and the output quality holds up for most real-world scenarios.
If you want to go deeper, read Myers’ original paper “An O(ND) Difference Algorithm and Its Variations” — it’s from 1986 but the ideas are still the foundation of git’s diff engine today.
Related tools: JSON Diff | Text Statistics | Code Formatter