Building a Line Sorter: From localeCompare to Fisher-Yates Shuffle
Building a Line Sorter: From localeCompare to Fisher-Yates Shuffle#
2026-05-03 19:03
Sorting, reversing, or shuffling text lines seems trivial—until you dig into the details. Let’s explore the algorithms behind a proper line sort tool.
localeCompare: The Right Way to Sort Multiple Languages#
Your first instinct might be sort() with comparison operators:
// Wrong: ASCII only
lines.sort((a, b) => a > b ? 1 : -1)
This fails hard with non-ASCII text. Chinese characters get sorted by Unicode code points, not pinyin. The solution? localeCompare:
lines.sort((a, b) => a.localeCompare(b, 'zh-CN'))
For descending order, swap the arguments:
lines.sort((a, b) => b.localeCompare(a, 'zh-CN'))
The Third Parameter#
localeCompare accepts an options object for fine-grained control:
a.localeCompare(b, 'en', {
sensitivity: 'base', // Ignore case and accents
numeric: true, // 'file10' > 'file2' (numeric comparison)
ignorePunctuation: true // Skip punctuation
})
The sensitivity option has four levels:
'base': Ignore case and accents'accent': Ignore case, distinguish accents'case': Distinguish case, ignore accents'variant': Distinguish everything (default)
Fisher-Yates Shuffle: True Randomness#
“Random shuffle” is often implemented wrong:
// Wrong: biased distribution
lines.sort(() => Math.random() - 0.5)
The problem? sort comparison functions aren’t designed for shuffling. V8’s Timsort implementation means elements have higher probability of staying in place.
The correct approach is the Fisher-Yates shuffle:
function shuffle(array) {
for (let i = array.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1))
;[array[i], array[j]] = [array[j], array[i]]
}
return array
}
The idea: iterate backwards, randomly swap with any preceding position. Each element has equal probability of landing anywhere—true uniform distribution.
Why Backwards?#
Both directions work, but backwards is cleaner:
// Forward (more verbose)
for (let i = 0; i < array.length - 1; i++) {
const j = i + Math.floor(Math.random() * (array.length - i))
;[array[i], array[j]] = [array[j], array[i]]
}
// Backward (cleaner)
for (let i = array.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1))
;[array[i], array[j]] = [array[j], array[i]]
}
Both are O(n) time complexity.
Set Deduplication: Preserving Order#
The cleanest way to dedupe is Set:
lines = Array.from(new Set(lines))
Does Set preserve insertion order? Yes. ES6 spec guarantees iteration order matches insertion order, so this both dedupes and maintains sequence.
Timing Matters#
Dedupe after sorting for better performance:
// Original
['banana', 'apple', 'cherry', 'apple']
// Sort first, dedupe after
['apple', 'apple', 'banana', 'cherry'] // Sorted
['apple', 'banana', 'cherry'] // Deduped ✓
With sorted data, duplicates are adjacent—Set can detect them faster.
Complete Implementation#
Putting it all together:
function sortLines(text, options = {}) {
const { order = 'asc', dedupe = false, removeEmpty = false, trim = false } = options
let lines = text.split('\n')
// Preprocessing
if (removeEmpty) lines = lines.filter(line => line.trim())
if (trim) lines = lines.map(line => line.trim())
// Sort
if (order === 'asc') {
lines.sort((a, b) => a.localeCompare(b, 'zh-CN'))
} else if (order === 'desc') {
lines.sort((a, b) => b.localeCompare(a, 'zh-CN'))
} else if (order === 'random') {
// Fisher-Yates shuffle
for (let i = lines.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1))
;[lines[i], lines[j]] = [lines[j], lines[i]]
}
} else if (order === 'reverse') {
lines.reverse()
}
// Dedupe
if (dedupe) lines = Array.from(new Set(lines))
return lines.join('\n')
}
Performance Considerations#
For massive files (100K+ lines), consider:
- Lazy processing: Use generators to batch processing
- Web Workers: Offload sorting to avoid UI blocking
- External sorting: Merge sort + temp files for data larger than memory
But for most use cases, localeCompare + Set is plenty fast.
Key Takeaways#
A simple line sort tool involves:
localeComparefor proper internationalization- Fisher-Yates for truly random shuffles
Setfor order-preserving deduplication- Operation order affects performance
Next time you need to shuffle an array, don’t reach for sort(() => Math.random() - 0.5)!
Related Tools:
- Text Deduplicator - Dedupe by line, word, or character
- Set Operations - Union, intersection, difference
- JSON Formatter - Format and sort JSON data