Linux cut Command: The Art of Column Extraction
Linux cut Command: The Art of Column Extraction#
The cut command is another essential text processing tool in Linux. Unlike awk’s power and sed’s flexibility, cut focuses on one thing: extracting specific columns from text. Simple, direct, efficient.
Core Usage#
cut offers three extraction modes:
# Extract by byte
cut -b 1-5 file.txt
# Extract by character (supports multibyte characters)
cut -c 1-5 file.txt
# Extract by field (most common)
cut -d ':' -f 1,3 file.txt
Field Extraction: The Most Common Use Case#
When dealing with CSV, TSV, or delimiter-separated files, the cut -d -f combination is powerful:
# Extract username and shell from /etc/passwd
cut -d ':' -f 1,7 /etc/passwd
# Output example:
# root:/bin/bash
# daemon:/usr/sbin/nologin
# www-data:/usr/sbin/nologin
Specifying Multiple Fields#
# Extract fields 1, 3, and 5
cut -d ',' -f 1,3,5 data.csv
# Extract fields 2 through 5
cut -d ',' -f 2-5 data.csv
# Extract from field 3 to the end
cut -d ',' -f 3- data.csv
# Extract from beginning to field 3
cut -d ',' -f -3 data.csv
Field Range Syntax Summary#
| Syntax | Meaning |
|---|---|
-f N |
Field N only |
-f N,M |
Fields N and M |
-f N-M |
Fields N through M |
-f N- |
From field N to end |
-f -N |
From beginning to field N |
Processing Text Without Delimiters#
When text lacks clear delimiters, extract by byte or character:
# Extract first 10 characters of each line
cut -c 1-10 file.txt
# Extract characters 5 through 15
cut -c 5-15 file.txt
# Extract characters at positions 1, 5, and 10
cut -c 1,5,10 file.txt
Bytes vs Characters#
In UTF-8 environments, bytes and characters differ:
# Chinese file test.txt
你好世界
Hello World
# Extract by byte (Chinese gets corrupted)
cut -b 1-3 test.txt
# Output: incomplete Chinese character
# Extract by character (handles Chinese correctly)
cut -c 1-3 test.txt
# Output: 你好世
Practical Examples#
1. Analyze Access Logs#
# Extract IP addresses (assuming first field)
cut -d ' ' -f 1 /var/log/nginx/access.log | sort | uniq -c | sort -rn | head
# Find most common access paths
cut -d '"' -f 2 /var/log/nginx/access.log | cut -d ' ' -f 2 | sort | uniq -c | sort -rn | head
2. Process CSV Data#
# Extract name and email
cut -d ',' -f 1,3 users.csv
# Extract only email domain
cut -d '@' -f 2 users.csv | cut -d ',' -f 1
3. Extract File Permission Info#
# Extract permissions and filename from ls -l
ls -l | cut -c 1-10,50-
# Or use awk for more flexibility
ls -l | awk '{print $1, $NF}'
4. Parse System Configurations#
# Extract all usernames
cut -d ':' -f 1 /etc/passwd
# Extract user ID and group ID
cut -d ':' -f 1,3,4 /etc/passwd
# View all login shells
cut -d ':' -f 7 /etc/passwd | sort -u
Additional Options#
--complement: Extract All Except Specified Fields#
# Extract all fields except field 2
cut -d ',' --complement -f 2 data.csv
# Equivalent to keeping other fields, removing field 2
--output-delimiter: Custom Output Separator#
# Convert colon-separated to tab-separated
cut -d ':' -f 1,7 --output-delimiter=$'\t' /etc/passwd
# Output:
# root /bin/bash
# daemon /usr/sbin/nologin
-s: Skip Lines Without Delimiter#
# Only output lines containing the delimiter
cut -d ':' -s -f 1 /etc/passwd
# Without -s, lines without colon would be printed as-is
Performance Considerations#
cut follows the Unix philosophy: “Do one thing and do it well”:
- Low memory footprint: Stream processing, doesn’t load entire file
- Fast: About 2-3x faster than
awkfor simple cases - Simple: Easy syntax, low learning curve
# Performance comparison (100MB file)
time cut -d ',' -f 1,3 data.csv > /dev/null
# real 0m1.2s
time awk -F ',' '{print $1, $3}' data.csv > /dev/null
# real 0m2.8s
Of course, awk is more powerful, but for simple column extraction, cut is the better choice.
Combining with Other Commands#
cut works seamlessly in pipelines:
# Find processes using most memory
ps aux --sort=-%mem | head -6 | tail -5 | cut -c 66-
# Extract domain from URL
echo "https://jsokit.com/tools/cut" | cut -d '/' -f 3
# Output: jsokit.com
# Extract authors from Git log
git log --format="%an" | sort | uniq -c | sort -rn | cut -c 9-
Common Pitfalls#
1. Multiple Consecutive Delimiters#
cut doesn’t handle consecutive delimiters—each counts as a field:
echo "a,,b" | cut -d ',' -f 2
# Output: empty (field 2 is empty)
echo "a b" | cut -d ' ' -f 2
# Output: empty (space between spaces is empty field)
# Solution: compress delimiters with tr
echo "a b" | tr -s ' ' | cut -d ' ' -f 2
# Output: b
2. Only Single-Character Delimiters#
# Wrong: multi-character delimiter not supported
cut -d '::' -f 1 file.txt # Error
# Solution: use awk or sed
awk -F '::' '{print $1}' file.txt
sed 's/::.*//' file.txt
3. Fields Are 1-Indexed#
# Note: field numbering starts at 1, not 0
cut -d ',' -f 0 data.csv # Won't error, but outputs nothing
Online Tool#
If you don’t want to open a terminal, try the Linux cut command tool to test various parameter combinations with real-time preview.
Summary#
When to use cut:
- Extract columns from fixed-format data
- Process CSV/TSV files
- Parse configuration files
- Field extraction in log analysis
Remember these common patterns:
cut -d 'delimiter' -f field_list # Extract by field
cut -c character_range # Extract by character
cut --complement -f fields # Extract except specified
cut --output-delimiter=new_delim # Custom output format
Simple, but handles 80% of column extraction needs.
Related: Linux awk Command | Linux sed Command