Linux cut Command: The Art of Column Extraction#

The cut command is another essential text processing tool in Linux. Unlike awk’s power and sed’s flexibility, cut focuses on one thing: extracting specific columns from text. Simple, direct, efficient.

Core Usage#

cut offers three extraction modes:

# Extract by byte
cut -b 1-5 file.txt

# Extract by character (supports multibyte characters)
cut -c 1-5 file.txt

# Extract by field (most common)
cut -d ':' -f 1,3 file.txt

Field Extraction: The Most Common Use Case#

When dealing with CSV, TSV, or delimiter-separated files, the cut -d -f combination is powerful:

# Extract username and shell from /etc/passwd
cut -d ':' -f 1,7 /etc/passwd

# Output example:
# root:/bin/bash
# daemon:/usr/sbin/nologin
# www-data:/usr/sbin/nologin

Specifying Multiple Fields#

# Extract fields 1, 3, and 5
cut -d ',' -f 1,3,5 data.csv

# Extract fields 2 through 5
cut -d ',' -f 2-5 data.csv

# Extract from field 3 to the end
cut -d ',' -f 3- data.csv

# Extract from beginning to field 3
cut -d ',' -f -3 data.csv

Field Range Syntax Summary#

Syntax Meaning
-f N Field N only
-f N,M Fields N and M
-f N-M Fields N through M
-f N- From field N to end
-f -N From beginning to field N

Processing Text Without Delimiters#

When text lacks clear delimiters, extract by byte or character:

# Extract first 10 characters of each line
cut -c 1-10 file.txt

# Extract characters 5 through 15
cut -c 5-15 file.txt

# Extract characters at positions 1, 5, and 10
cut -c 1,5,10 file.txt

Bytes vs Characters#

In UTF-8 environments, bytes and characters differ:

# Chinese file test.txt
你好世界
Hello World

# Extract by byte (Chinese gets corrupted)
cut -b 1-3 test.txt
# Output: incomplete Chinese character

# Extract by character (handles Chinese correctly)
cut -c 1-3 test.txt
# Output: 你好世

Practical Examples#

1. Analyze Access Logs#

# Extract IP addresses (assuming first field)
cut -d ' ' -f 1 /var/log/nginx/access.log | sort | uniq -c | sort -rn | head

# Find most common access paths
cut -d '"' -f 2 /var/log/nginx/access.log | cut -d ' ' -f 2 | sort | uniq -c | sort -rn | head

2. Process CSV Data#

# Extract name and email
cut -d ',' -f 1,3 users.csv

# Extract only email domain
cut -d '@' -f 2 users.csv | cut -d ',' -f 1

3. Extract File Permission Info#

# Extract permissions and filename from ls -l
ls -l | cut -c 1-10,50-
# Or use awk for more flexibility
ls -l | awk '{print $1, $NF}'

4. Parse System Configurations#

# Extract all usernames
cut -d ':' -f 1 /etc/passwd

# Extract user ID and group ID
cut -d ':' -f 1,3,4 /etc/passwd

# View all login shells
cut -d ':' -f 7 /etc/passwd | sort -u

Additional Options#

--complement: Extract All Except Specified Fields#

# Extract all fields except field 2
cut -d ',' --complement -f 2 data.csv

# Equivalent to keeping other fields, removing field 2

--output-delimiter: Custom Output Separator#

# Convert colon-separated to tab-separated
cut -d ':' -f 1,7 --output-delimiter=$'\t' /etc/passwd

# Output:
# root    /bin/bash
# daemon  /usr/sbin/nologin

-s: Skip Lines Without Delimiter#

# Only output lines containing the delimiter
cut -d ':' -s -f 1 /etc/passwd

# Without -s, lines without colon would be printed as-is

Performance Considerations#

cut follows the Unix philosophy: “Do one thing and do it well”:

  1. Low memory footprint: Stream processing, doesn’t load entire file
  2. Fast: About 2-3x faster than awk for simple cases
  3. Simple: Easy syntax, low learning curve
# Performance comparison (100MB file)
time cut -d ',' -f 1,3 data.csv > /dev/null
# real    0m1.2s

time awk -F ',' '{print $1, $3}' data.csv > /dev/null
# real    0m2.8s

Of course, awk is more powerful, but for simple column extraction, cut is the better choice.

Combining with Other Commands#

cut works seamlessly in pipelines:

# Find processes using most memory
ps aux --sort=-%mem | head -6 | tail -5 | cut -c 66-

# Extract domain from URL
echo "https://jsokit.com/tools/cut" | cut -d '/' -f 3
# Output: jsokit.com

# Extract authors from Git log
git log --format="%an" | sort | uniq -c | sort -rn | cut -c 9-

Common Pitfalls#

1. Multiple Consecutive Delimiters#

cut doesn’t handle consecutive delimiters—each counts as a field:

echo "a,,b" | cut -d ',' -f 2
# Output: empty (field 2 is empty)

echo "a  b" | cut -d ' ' -f 2
# Output: empty (space between spaces is empty field)

# Solution: compress delimiters with tr
echo "a  b" | tr -s ' ' | cut -d ' ' -f 2
# Output: b

2. Only Single-Character Delimiters#

# Wrong: multi-character delimiter not supported
cut -d '::' -f 1 file.txt  # Error

# Solution: use awk or sed
awk -F '::' '{print $1}' file.txt
sed 's/::.*//' file.txt

3. Fields Are 1-Indexed#

# Note: field numbering starts at 1, not 0
cut -d ',' -f 0 data.csv  # Won't error, but outputs nothing

Online Tool#

If you don’t want to open a terminal, try the Linux cut command tool to test various parameter combinations with real-time preview.

Summary#

When to use cut:

  • Extract columns from fixed-format data
  • Process CSV/TSV files
  • Parse configuration files
  • Field extraction in log analysis

Remember these common patterns:

cut -d 'delimiter' -f field_list    # Extract by field
cut -c character_range              # Extract by character
cut --complement -f fields          # Extract except specified
cut --output-delimiter=new_delim    # Custom output format

Simple, but handles 80% of column extraction needs.


Related: Linux awk Command | Linux sed Command