Duplicate Line Finder: From Brute Force to Map Optimization#

2026-05-04 03:28

I was dealing with a log file containing tons of duplicate error messages. Manual checking was too slow, so I built a tool. Here are the different approaches I tried.

The Core Problem#

Finding duplicate lines in text means counting how many times each string appears. Simple concept, but performance varies dramatically between implementations.

Three Approaches#

1. Brute Force: Nested Loops#

function findDuplicatesBruteForce(lines: string[]): Map<string, number> {
  const counts = new Map<string, number>()

  for (let i = 0; i < lines.length; i++) {
    let count = 1
    for (let j = i + 1; j < lines.length; j++) {
      if (lines[i] === lines[j]) {
        count++
      }
    }
    if (count > 1) {
      counts.set(lines[i], count)
    }
  }

  return counts
}

Time complexity O(n²). For 10,000 lines, that’s 50 million comparisons. Browser freezes on large datasets.

2. Sorting: Adjacent Comparison#

function findDuplicatesSorted(lines: string[]): Map<string, number> {
  const sorted = [...lines].sort()
  const counts = new Map<string, number>()

  let currentLine = sorted[0]
  let count = 1

  for (let i = 1; i < sorted.length; i++) {
    if (sorted[i] === currentLine) {
      count++
    } else {
      if (count > 1) {
        counts.set(currentLine, count)
      }
      currentLine = sorted[i]
      count = 1
    }
  }

  if (count > 1) {
    counts.set(currentLine, count)
  }

  return counts
}

Time complexity O(n log n), determined by sorting. Much faster than brute force, but changes original order.

3. Map: Single Pass#

function findDuplicatesMap(lines: string[]): Map<string, number> {
  const counts = new Map<string, number>()

  for (const line of lines) {
    counts.set(line, (counts.get(line) || 0) + 1)
  }

  // Filter duplicates
  const duplicates = new Map<string, number>()
  for (const [line, count] of counts) {
    if (count > 1) {
      duplicates.set(line, count)
    }
  }

  return duplicates
}

Time complexity O(n), single pass. Map’s get and set are O(1) operations. This is the optimal solution.

Performance Comparison#

Testing with 10,000 lines (containing 1,000 duplicates):

Method Time Complexity Execution Time
Brute Force O(n²) 1.2 seconds
Sorting O(n log n) 15 milliseconds
Map O(n) 3 milliseconds

Map is 400x faster than brute force, 5x faster than sorting.

Case Sensitivity Handling#

Users may want to distinguish “Apple” from “apple”, or not:

function findDuplicates(lines: string[], caseSensitive: boolean): Map<string, number> {
  const counts = new Map<string, number>()

  for (const line of lines) {
    // Use processed key for counting, keep original for display
    const key = caseSensitive ? line : line.toLowerCase()
    counts.set(key, (counts.get(key) || 0) + 1)
  }

  return counts
}

The key: use the processed key for comparison, keep original value for output.

Deduplication: Preserving Order#

After finding duplicates, the next step is usually removing them. But preserve order:

function removeDuplicates(lines: string[], caseSensitive: boolean): string[] {
  const seen = new Set<string>()
  const result: string[] = []

  for (const line of lines) {
    const key = caseSensitive ? line : line.toLowerCase()

    if (!seen.has(key)) {
      seen.add(key)
      result.push(line)  // Keep original form
    }
  }

  return result
}

Set’s has() is O(1) lookup, much faster than Array’s includes() (O(n)).

Edge Cases#

Several details in practice:

1. Empty Line Filtering#

if (line.trim() !== '') {
  counts.set(key, (counts.get(key) || 0) + 1)
}

Users may not care about duplicate empty lines.

2. Trimming Whitespace#

Users might consider " apple" and “apple” as the same:

const key = trimWhitespace ? line.trim() : line

3. Large Files#

For files over 10MB, direct processing freezes. Solutions:

  • Use Web Worker for background processing
  • Chunk reading with streaming
  • Prompt user to reduce data
// Web Worker example
const worker = new Worker('duplicate-worker.js')
worker.postMessage({ lines: largeArray })
worker.onmessage = (e) => {
  const duplicates = e.data
  // Update UI
}

The Complete Tool#

Based on these ideas, I built: Duplicate Line Finder

Features:

  • Find duplicates and count occurrences
  • Remove duplicates while preserving order
  • Case sensitivity option
  • Whitespace trimming option
  • Real-time stats: total lines, unique lines, duplicate lines

The code isn’t complex, but balancing performance and UX takes thought. Map is the key—it makes O(n) possible.


Related: Text Deduplicate | Line Sort