Building a Line Sorter: From localeCompare to Fisher-Yates Shuffle#

2026-05-03 19:03

Sorting, reversing, or shuffling text lines seems trivial—until you dig into the details. Let’s explore the algorithms behind a proper line sort tool.

localeCompare: The Right Way to Sort Multiple Languages#

Your first instinct might be sort() with comparison operators:

// Wrong: ASCII only
lines.sort((a, b) => a > b ? 1 : -1)

This fails hard with non-ASCII text. Chinese characters get sorted by Unicode code points, not pinyin. The solution? localeCompare:

lines.sort((a, b) => a.localeCompare(b, 'zh-CN'))

For descending order, swap the arguments:

lines.sort((a, b) => b.localeCompare(a, 'zh-CN'))

The Third Parameter#

localeCompare accepts an options object for fine-grained control:

a.localeCompare(b, 'en', {
  sensitivity: 'base',     // Ignore case and accents
  numeric: true,           // 'file10' > 'file2' (numeric comparison)
  ignorePunctuation: true  // Skip punctuation
})

The sensitivity option has four levels:

  • 'base': Ignore case and accents
  • 'accent': Ignore case, distinguish accents
  • 'case': Distinguish case, ignore accents
  • 'variant': Distinguish everything (default)

Fisher-Yates Shuffle: True Randomness#

“Random shuffle” is often implemented wrong:

// Wrong: biased distribution
lines.sort(() => Math.random() - 0.5)

The problem? sort comparison functions aren’t designed for shuffling. V8’s Timsort implementation means elements have higher probability of staying in place.

The correct approach is the Fisher-Yates shuffle:

function shuffle(array) {
  for (let i = array.length - 1; i > 0; i--) {
    const j = Math.floor(Math.random() * (i + 1))
    ;[array[i], array[j]] = [array[j], array[i]]
  }
  return array
}

The idea: iterate backwards, randomly swap with any preceding position. Each element has equal probability of landing anywhere—true uniform distribution.

Why Backwards?#

Both directions work, but backwards is cleaner:

// Forward (more verbose)
for (let i = 0; i < array.length - 1; i++) {
  const j = i + Math.floor(Math.random() * (array.length - i))
  ;[array[i], array[j]] = [array[j], array[i]]
}

// Backward (cleaner)
for (let i = array.length - 1; i > 0; i--) {
  const j = Math.floor(Math.random() * (i + 1))
  ;[array[i], array[j]] = [array[j], array[i]]
}

Both are O(n) time complexity.

Set Deduplication: Preserving Order#

The cleanest way to dedupe is Set:

lines = Array.from(new Set(lines))

Does Set preserve insertion order? Yes. ES6 spec guarantees iteration order matches insertion order, so this both dedupes and maintains sequence.

Timing Matters#

Dedupe after sorting for better performance:

// Original
['banana', 'apple', 'cherry', 'apple']

// Sort first, dedupe after
['apple', 'apple', 'banana', 'cherry']  // Sorted
['apple', 'banana', 'cherry']           // Deduped ✓

With sorted data, duplicates are adjacent—Set can detect them faster.

Complete Implementation#

Putting it all together:

function sortLines(text, options = {}) {
  const { order = 'asc', dedupe = false, removeEmpty = false, trim = false } = options
  
  let lines = text.split('\n')
  
  // Preprocessing
  if (removeEmpty) lines = lines.filter(line => line.trim())
  if (trim) lines = lines.map(line => line.trim())
  
  // Sort
  if (order === 'asc') {
    lines.sort((a, b) => a.localeCompare(b, 'zh-CN'))
  } else if (order === 'desc') {
    lines.sort((a, b) => b.localeCompare(a, 'zh-CN'))
  } else if (order === 'random') {
    // Fisher-Yates shuffle
    for (let i = lines.length - 1; i > 0; i--) {
      const j = Math.floor(Math.random() * (i + 1))
      ;[lines[i], lines[j]] = [lines[j], lines[i]]
    }
  } else if (order === 'reverse') {
    lines.reverse()
  }
  
  // Dedupe
  if (dedupe) lines = Array.from(new Set(lines))
  
  return lines.join('\n')
}

Performance Considerations#

For massive files (100K+ lines), consider:

  1. Lazy processing: Use generators to batch processing
  2. Web Workers: Offload sorting to avoid UI blocking
  3. External sorting: Merge sort + temp files for data larger than memory

But for most use cases, localeCompare + Set is plenty fast.

Key Takeaways#

A simple line sort tool involves:

  • localeCompare for proper internationalization
  • Fisher-Yates for truly random shuffles
  • Set for order-preserving deduplication
  • Operation order affects performance

Next time you need to shuffle an array, don’t reach for sort(() => Math.random() - 0.5)!


Related Tools: