Linux tee Command Deep Dive: Pipe Splitting and Dual-Output Implementation#

Ever hit this problem? You pipe data through a chain of commands, but you can’t see the intermediate results. Like cat access.log | grep 404 | wc -l — you only get the count, not the matching lines. That’s where tee comes in.

The Core: Data Stream Forking#

The name “tee” comes from plumbing — the T-junction fitting. Its job is simple: read from stdin, write to both stdout and a file.

The implementation is straightforward:

#include <unistd.h>
#include <fcntl.h>

void tee_impl(const char* filename, int append) {
    char buf[4096];
    int flags = O_WRONLY | O_CREAT | (append ? O_APPEND : O_TRUNC);
    int fd = open(filename, flags, 0644);

    ssize_t n;
    while ((n = read(STDIN_FILENO, buf, sizeof(buf))) > 0) {
        write(STDOUT_FILENO, buf, n);  // output to screen
        write(fd, buf, n);              // write to file
    }
    close(fd);
}

The key: two write calls — one to stdout, one to the file descriptor. That’s tee in a nutshell: data duplication and distribution.

Practical Scenarios#

Scenario 1: Debugging Pipeline Intermediate Results#

# Only see the final count
cat access.log | grep 404 | wc -l

# See matching lines AND count
cat access.log | grep 404 | tee matches.txt | wc -l
# Screen shows count, file saves matching lines

# Or display directly on terminal
cat access.log | grep 404 | tee /dev/tty | wc -l

/dev/tty is the current terminal device — writing there displays on screen.

Scenario 2: Build Logs with Simultaneous Monitoring#

make 2>&1 | tee build.log

2>&1 redirects stderr to stdout, so error messages get captured by tee too. Real-time output on screen, full log saved to file.

Scenario 3: Append Mode#

# Overwrites each run
echo "run 1" | tee output.txt

# Appends each run
echo "run 2" | tee -a output.txt

The -a flag maps to O_APPEND, appending instead of truncating.

Scenario 4: Writing to Multiple Files#

echo "critical config" | tee config.prod.yaml config.staging.yaml config.dev.yaml

tee accepts multiple file arguments — one command, synchronized writes.

Advanced: Process Substitution and Stream Splitting#

tee’s real power emerges with process substitution:

# Output to terminal and filtered result simultaneously
df -h | tee >(grep "/$" > root_disks.txt) > all_disks.txt

>() is process substitution — it creates a named pipe. tee sends data to both grep and the raw output file.

# Real-time monitoring + error log + warning log
tail -f app.log | tee >(grep ERROR >> errors.txt) >(grep WARN >> warns.txt) | cat

One command monitors the log stream, extracts errors, extracts warnings — all at once.

Parameter Breakdown#

Flag Purpose Use Case
-a Append mode Log accumulation, cumulative results
-i Ignore SIGINT Prevent Ctrl+C from interrupting writes
-p Diagnose write errors Detect disk full, permission issues

The -i flag works by registering a signal handler:

#include <signal.h>
signal(SIGINT, SIG_IGN);  // Ignore Ctrl+C

This ensures tee finishes writing buffered data before exiting.

Web Implementation: Browser-Side Stream Splitting#

Implementing similar functionality in JavaScript:

class TeeStream {
  private outputs: WritableStream[] = [];

  addOutput(stream: WritableStream) {
    this.outputs.push(stream);
  }

  async pipe(input: ReadableStream) {
    const reader = input.getReader();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      // Write to all output streams simultaneously
      await Promise.all(
        this.outputs.map(stream => {
          const writer = stream.getWriter();
          return writer.write(value).then(() => writer.releaseLock());
        })
      );
    }
  }
}

// Usage example
const tee = new TeeStream();
tee.addOutput(new WritableStream({
  write: chunk => console.log(new TextDecoder().decode(chunk))
}));
tee.addOutput(fileWritableStream);

await tee.pipe(fetchResponse.body);

The Web Streams API design philosophy mirrors Unix pipes.

Performance Considerations#

tee’s overhead comes from two write system calls. For typical usage, this is negligible. Real concerns:

  1. Disk I/O bottleneck: File output speed may become the limiting factor
  2. Buffer size: Default 4KB; increase for high-throughput scenarios
  3. Signal handling overhead: The -i flag adds signal processing logic

Benchmark: Processing 1GB data streams, tee adds ~2-3% CPU overhead. The bottleneck is disk I/O.

Common Pitfalls#

1. File Permission Issues#

ls /root | tee /root/output.txt
# Permission denied

tee writes with current user permissions. Use sudo:

ls /root | sudo tee /root/output.txt

2. Pipe Buffer Limits#

Linux pipes default to 64KB buffer. If the downstream process is slow, upstream gets blocked:

# Slow downstream blocks upstream
tail -f large.log | tee >(sleep 1; cat) | cat

Use buffer to increase buffer size:

tail -f large.log | buffer -s 1MB | tee output.txt

3. Signal Propagation#

By default, Ctrl+C terminates the entire pipeline. tee might be interrupted mid-write, causing incomplete data. Use -i:

long_running_command | tee -i output.txt

Wrapping Up#

tee looks simple, but it’s the backbone of data flow management. From low-level read/write syscalls to process substitution tricks, mastering tee makes pipeline operations effortless.

Key takeaways:

  • Core function: data duplication, one read → multiple writes
  • Process substitution enables multi-way splitting
  • -a for append, -i for signal protection are common flags
  • Web Streams API provides browser-side equivalent capabilities

Next time you need “both this AND that” in a pipeline, just tee it.


Related: Linux xargs Command | Linux grep Command