From LCS to Myers: The Algorithms Behind Text Diff Checkers#

Published: 2026-05-02 01:53

If you use git daily, you’ve seen red and green diff lines more times than you can count. They look simple enough — deleted lines turn red, added lines turn green. But when I built an online diff checker from scratch, I quickly discovered how much algorithmic depth sits behind this deceptively simple UI.

The Obvious First Attempt: Two Pointers#

The naive approach is to split both texts into lines, walk two pointers forward, and flag differences as they appear:

while (oi < origLines.length || mi < modLines.length) {
  if (origLines[oi] === modLines[mi]) {
    // Lines match — mark as equal, advance both pointers
    result.push({ type: 'equal', line: origLines[oi] })
    oi++; mi++
  } else {
    // Mismatch — figure out what happened
  }
}

Straightforward, but there’s an immediate problem: when two lines don’t match, how do you know if a line was added or removed? There’s no way to tell with just local context.

Greedy Lookahead Strategy#

The common fix is to look ahead in both directions when a mismatch occurs:

// Search forward in modified text for the current original line
let foundInMod = -1
for (let k = mi + 1; k < Math.min(mi + 10, modLines.length); k++) {
  if (modLines[k] === origLines[oi]) { foundInMod = k; break }
}

// Search forward in original text for the current modified line  
let foundInOrig = -1
for (let k = oi + 1; k < Math.min(oi + 10, origLines.length); k++) {
  if (origLines[k] === modLines[mi]) { foundInOrig = k; break }
}

The lookahead depth limit of 10 is a deliberate design choice. Without it, the algorithm degenerates to O(N×M) complexity. With it, you accept that extremely large deletion blocks (>10 lines) might not align perfectly — but for most real-world diffs, this trade-off works well.

The LCS Approach: Optimal but Expensive#

The greedy strategy can produce suboptimal alignments. Consider:

Original: A B C D E
Modified: A C B D E

A greedy approach might mark B as deleted and C as added, when the real story is just a reordering.

This brings us to the Longest Common Subsequence (LCS) problem — a classic DP staple:

function lcsLength(a: string[], b: string[]) {
  const m = a.length, n = b.length
  const dp = Array.from({ length: m + 1 }, () => new Array(n + 1).fill(0))
  
  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      dp[i][j] = a[i - 1] === b[j - 1]
        ? dp[i - 1][j - 1] + 1
        : Math.max(dp[i - 1][j], dp[i][j - 1])
    }
  }
  return dp
}

LCS guarantees the optimal alignment. The cost? O(M×N) time and space. For a 1000-line document, that’s 1 million cells — manageable. For a 10,000-line file, it’s 100 million cells, and your browser will start sweating.

Myers Diff: What Git Uses#

This is where Myers’ algorithm shines. Published by Eugene Myers in 1986, it frames the diff problem as a shortest path search on an edit graph. Each cell in the edit graph represents a state (position in A, position in B), and edges represent matching lines, deletions, or insertions.

The key insight is that Myers uses a greedy search with backtracking that runs in O(M+N+D²), where D is the number of differences. For typical code changes where D is small (a few lines), this is essentially linear.

When would LCS outperform Myers? The edge cases involve many identical lines — think of a 1000-line file where every line is blank, and you change a few. Myers can produce surprising alignments in these scenarios because it prioritizes minimizing the edit count over producing visually intuitive output.

Practical Trade-offs for a Browser Tool#

After studying these algorithms, here’s what I settled on for a browser-based tool:

  • Small texts (<100 lines): Any algorithm works. LCS gives the nicest output.
  • Medium texts (100-1000 lines): Greedy lookahead with a depth limit is the sweet spot — good enough alignment without the O(N²) cost.
  • Large texts (>1000 lines): You need chunking or a Myers variant. A single-threaded LCS will freeze the UI.

The Diff Checker on JsonKit uses the greedy lookahead approach with a 10-line search limit. It also provides line numbers, change statistics (added/removed/unchanged counts), and unified diff export. Fast enough for daily use, and the output quality holds up for most real-world scenarios.

If you want to go deeper, read Myers’ original paper “An O(ND) Difference Algorithm and Its Variations” — it’s from 1986 but the ideas are still the foundation of git’s diff engine today.


Related tools: JSON Diff | Text Statistics | Code Formatter