Deep Dive into Linux ps Command: From Process States to Performance Monitoring#

As a Linux sysadmin, ps is one of the most frequently used tools. But most people only know ps aux without understanding the implementation behind it. Let’s dive deep into this command.

The Core: Reading /proc Filesystem#

ps doesn’t call system APIs directly. Instead, it reads the /proc virtual filesystem:

# ps essentially reads these files
ls /proc/1234/
# cmdlin  comm  cwd  exe  fd  maps  stat  status  ...

Each process has a directory under /proc named by its PID, containing various files:

  • cmdline: Command-line arguments (null-separated)
  • comm: Process name
  • stat: Process status (machine-readable)
  • status: Process status (human-readable)
  • fd/: Directory of open file descriptors
  • exe: Symlink to the executable

Understanding Every Column in ps aux#

ps aux
# USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
# root         1  0.0  0.1 169424 11200 ?        Ss   May08   0:05 /sbin/init

Key Fields Explained#

VSZ (Virtual Memory Size)

  • Process virtual memory size (KB)
  • Includes heap, stack, shared libraries, unallocated memory
  • Usually large, but doesn’t represent actual usage

RSS (Resident Set Size)

  • Actual physical memory used (KB)
  • Excludes swapped memory
  • Real memory consumption

STAT (Process State)

  • R: Running (executing or ready)
  • S: Sleeping (interruptible, waiting for event)
  • D: Disk sleep (uninterruptible, usually waiting for I/O)
  • Z: Zombie (terminated but not reaped by parent)
  • T: Stopped (paused)

State modifiers:

  • +: Foreground process group
  • -: Session leader
  • l: Multi-threaded process
  • <: High-priority process
  • N: Low-priority process
  • s: Session leader

The %CPU Calculation Pitfall#

ps calculates CPU usage with:

%CPU = (Total CPU time / Total runtime) * 100

Here’s the catch: ps aux shows the average CPU usage since process start, not real-time!

A process that runs 1 second of CPU then sleeps for 1 hour will show very low %CPU.

For real-time CPU usage, use top or pidstat.

Practical Cases: Finding High CPU Processes#

Case 1: Find Top CPU Consumers#

# --sort=-%cpu sorts by CPU descending
ps aux --sort=-%cpu | head -10

# Output
# USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
# mysql    10234 78.5 15.2 4523124 1.2g ?        Sl   May08 123:45 /usr/sbin/mysqld

Case 2: View Process Threads#

# -L shows threads, LWP is thread ID
ps -Lp 10234

# PID   LWP   TTY      STAT  TIME COMMAND
# 10234 10234 ?        Sl    0:05 mysqld
# 10234 10235 ?        Sl    0:12 mysqld
# 10234 10236 ?        Sl    0:08 mysqld

LWP (Light Weight Process) is the thread ID. In Linux, threads are essentially lightweight processes.

Case 3: View Process Tree#

# --forest shows parent-child relationships
ps auxf

# Or use pstree
pstree -p 10234

Case 4: Find Zombie Processes#

# Find processes with state Z
ps aux | awk '$8 ~ /Z/ {print}'

# Output
# user  12345  0.0  0.0      0     0 pts/0    Z+   10:23   0:00 [python] <defunct>

Zombie processes show <defunct> in CMD.

ps vs top vs htop#

Tool Feature Use Case
ps Snapshot, one-time query Process info lookup, scripting
top Real-time refresh, interactive Live monitoring, dynamic observation
htop Colorful UI, mouse support User-friendly live monitoring

Performance difference:

  • ps aux scans all processes in ~10-50ms
  • top refreshes every second, continuous CPU usage
  • htop uses more resources than top (color rendering, more calculations)

Advanced Techniques#

1. Custom Output Format#

# -o specifies columns to display
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem

# PID  PPID CMD                         %MEM %CPU
# 1234     1 /usr/sbin/mysqld            15.2 78.5
# 5678     1 /usr/bin/dockerd             8.3 12.4

2. View Process Open Files#

# All files opened by process 1234
ls -l /proc/1234/fd/

# Or use lsof
lsof -p 1234

3. View Process Environment Variables#

# Environment variables at process start
cat /proc/1234/environ | tr '\0' '\n'

4. View Process Memory Maps#

cat /proc/1234/maps

# Output format
# Address range       Perms Offset   Dev   Inode   Path
# 00400000-0040b000 r-xp 00000000 08:01 262210  /usr/bin/ps
# 0060a000-0060b000 r--p 0000a000 08:01 262210  /usr/bin/ps

Common Pitfalls#

1. Zombie Processes Can’t Be Killed#

kill -9 12345  # Doesn't work on zombies

Zombie processes are already terminated. kill -9 has no effect. Correct approach:

  1. Find parent: ps -ef | grep 12345
  2. Restart or fix the parent to call wait() and reap the child

2. VSZ ≠ Actual Memory Usage#

ps aux | grep mysql
# VSZ 4523124 (4.3GB)
# RSS 1258291 (1.2GB)  <- This is real usage

3. D-State Processes Are Uninterruptible#

ps aux | awk '$8 ~ /D/'
# Processes in D state are usually waiting for NFS, disk I/O
# kill -9 won't work, must wait for I/O to complete

Web Implementation: Browser-Based Process Monitor#

Browsers can’t access /proc directly, but can proxy through an API:

// Backend API: /api/processes
export async function GET() {
  const fs = require('fs')
  const processes = []

  // Read all numeric directories under /proc (processes)
  const pids = fs.readdirSync('/proc').filter(d => /^\d+$/.test(d))

  for (const pid of pids) {
    try {
      const stat = fs.readFileSync(`/proc/${pid}/stat`, 'utf-8')
      const comm = fs.readFileSync(`/proc/${pid}/comm`, 'utf-8').trim()

      // Parse stat file (complex format, space-separated)
      const parts = stat.split(' ')
      const utime = parseInt(parts[13])  // User mode time
      const stime = parseInt(parts[14])  // Kernel mode time

      processes.push({
        pid: parseInt(pid),
        name: comm,
        utime: utime,
        stime: stime,
        state: parts[2]  // Process state
      })
    } catch (e) {
      // Process may have exited
    }
  }

  return Response.json(processes)
}

Frontend display:

function ProcessList() {
  const [processes, setProcesses] = useState([])

  useEffect(() => {
    const interval = setInterval(async () => {
      const res = await fetch('/api/processes')
      const data = await res.json()
      setProcesses(data)
    }, 1000)

    return () => clearInterval(interval)
  }, [])

  return (
    <table>
      <thead>
        <tr>
          <th>PID</th>
          <th>Name</th>
          <th>State</th>
          <th>CPU Time</th>
        </tr>
      </thead>
      <tbody>
        {processes.map(p => (
          <tr key={p.pid}>
            <td>{p.pid}</td>
            <td>{p.name}</td>
            <td>{p.state}</td>
            <td>{p.utime + p.stime}</td>
          </tr>
        ))}
      </tbody>
    </table>
  )
}

Summary#

The ps command seems simple but contains core knowledge of Linux process management:

  1. Data source: /proc virtual filesystem
  2. Key fields: VSZ (virtual), RSS (real), STAT (state)
  3. Performance metrics: %CPU is average, not real-time
  4. Advanced usage: Custom formats, sorting, thread inspection
  5. Common pitfalls: Zombies can’t be killed, D-state is uninterruptible

Mastering ps is fundamental to Linux performance troubleshooting.


Related Tools: