Linux awk Command: The Swiss Army Knife for Text Processing#

awk is one of the most powerful text processing tools in Linux. The name comes from the initials of its three creators: Aho, Weinberger, and Kernighan. Many people only use it for simple column extraction, but awk’s capabilities go far beyond that.

The Core Model#

awk’s workflow can be summarized as:

awk 'pattern { action }' file

pattern: Matching condition (regex, expression, range)
action: Operation to perform (print, calculate, variable assignment)

For each line, awk will:

Automatically split fields by delimiter (default: whitespace)
Store fields in $1, $2, $3..., with the whole line as $0
Check the pattern, execute action if matched

# Extract first and third columns
awk '{ print $1, $3 }' data.txt

# Only process lines containing "error"
awk '/error/ { print $0 }' app.log

# Count total lines in file
awk 'END { print NR }' data.txt

NR is a built-in variable representing the current line number (Number of Records). END is a special pattern that executes after all lines are processed.

Field Separators: Beyond Whitespace#

The -F parameter specifies the field separator:

# CSV file with comma separator
awk -F',' '{ print $1, $3 }' data.csv

# Regex separator: one or more spaces
awk -F'[ ]+' '{ print $1 }' data.txt

# Multi-character separator
awk -F'|' '{ print $1 }' data.txt

You can also set FS (Field Separator) inside the script:

awk 'BEGIN { FS = "," } { print $1, $3 }' data.csv

BEGIN executes before any line is processed, commonly used for variable initialization.

Built-in Variables Secrets#

awk provides several built-in variables:

Variable	Meaning
`$0`	Entire line content
`$1~$n`	nth field
`NF`	Number of fields in current line
`NR`	Current line number (global)
`FNR`	Current line number (current file)
`FS`	Field separator
`OFS`	Output field separator
`RS`	Record separator
`ORS`	Output record separator

The power of NF: reference the last field

# Print the last field of each line
awk '{ print $NF }' data.txt

# Print the second-to-last field
awk '{ print $(NF-1) }' data.txt

Conditionals and Loops#

awk supports if-else and for/while loops:

# Filter and label by condition
awk '{
  if ($3 > 100) {
    print $1, "HIGH"
  } else {
    print $1, "NORMAL"
  }
}' data.txt

# Calculate sum of fields per line
awk '{
  sum = 0
  for (i = 1; i <= NF; i++) {
    sum += $i
  }
  print sum
}' numbers.txt

Arrays and Statistics#

awk arrays are associative arrays, keys can be any string:

# Count word occurrences
awk '{
  for (i = 1; i <= NF; i++) {
    count[$i]++
  }
}
END {
  for (word in count) {
    print word, count[word]
  }
}' text.txt

# Count HTTP status codes by access count
awk '{ count[$9]++ } END { for (code in count) print code, count[code] }' access.log

Here $9 is the status code field in Nginx logs (assuming standard format).

Real-World Example: Analyzing Nginx Access Logs#

Assume log format:

192.168.1.1 - - [10/May/2026:10:30:45 +0800] "GET /api/users HTTP/1.1" 200 1234 "-" "Mozilla/5.0"

1. Top 10 Visiting IPs#

awk '{ print $1 }' access.log | sort | uniq -c | sort -rn | head -10

Pure awk implementation:

awk '{
  ip[$1]++
}
END {
  for (i in ip) print ip[i], i
}' access.log | sort -rn | head -10

2. Calculate Average Response Time#

Assume log format includes response time (last field):

awk '{
  total += $NF
  count++
}
END {
  print "Average:", total/count, "ms"
}' access.log

3. Extract 4xx and 5xx Errors#

# Extract all 4xx and 5xx status requests
awk '$9 ~ /^[45][0-9][0-9]$/ { print $0 }' access.log

# Count error type distribution
awk '$9 ~ /^[45][0-9][0-9]$/ {
  errors[$9]++
}
END {
  for (code in errors) print code, errors[code]
}' access.log

~ is the regex match operator, $9 ~ /^.../ means the 9th field matches the regex.

Performance Optimization Tips#

1. Skip Invalid Lines#

Use next to skip lines that don’t need processing:

awk '/^#/ { next } { print $1 }' config.conf

Skip comment lines (starting with #).

2. Process Only First N Lines#

awk 'NR > 100 { exit } { print $1 }' data.txt

Exit after processing first 100 lines, avoid reading entire large file.

3. FNR for Multi-File Processing#

When processing multiple files, NR is global line number, FNR is current file line number:

# Separate statistics per file
awk 'FNR == 1 { print "File:", FILENAME } { print NR, FNR, $0 }' file1.txt file2.txt

Advanced Example: Calculating Moving Average#

Given a temperature data file with one temperature per line, calculate 3-point moving average:

awk '{
  values[NR] = $1
  if (NR >= 3) {
    sum = values[NR] + values[NR-1] + values[NR-2]
    print (NR-2), sum/3
  }
}' temperature.txt

awk vs sed vs grep#

Many people confuse these three tools:

Tool	Core Capability	Typical Use Case
grep	Line filtering	Quickly search matching lines
sed	Stream editing	Replace, delete, insert
awk	Field processing + calculation	Statistics, reports, formatting

They’re often used together:

# Combined example: extract error lines, replace timestamp format, count by hour
grep "ERROR" app.log | \
  sed 's/\[.*\]//' | \
  awk '{ count[$1]++ } END { for (h in count) print h, count[h] }'

Summary#

awk’s power lies in:

Automatic field splitting, no manual split needed
Complete programming language (variables, arrays, functions, loops)
Built-in pattern matching mechanism

Mastering awk makes text file processing as efficient as querying a database with SQL. Complex statistics, formatting, and transformation tasks can be done in a single awk command.