Linux tail Command Deep Dive: From End-of-File Reading to Real-Time Log Monitoring
Linux tail Command Deep Dive: From End-of-File Reading to Real-Time Log Monitoring#
When a production service fails, the first instinct is tail -f logs/error.log. This command looks simple, but its implementation is worth understanding. Let’s dive into the technical details.
The Core: Reverse Newline Search#
tail displays the last 10 lines by default. But how does it know where “the last 10 lines” start?
The key is searching backwards from the end of the file. Here’s the core logic (simplified C):
// Find the Nth newline from the end of file
off_t find_nth_newline_from_end(int fd, int n) {
off_t file_size = lseek(fd, 0, SEEK_END);
off_t pos = file_size;
int newline_count = 0;
char buf[4096];
while (pos > 0 && newline_count < n) {
off_t chunk_start = pos > 4096 ? pos - 4096 : 0;
size_t read_size = pos - chunk_start;
lseek(fd, chunk_start, SEEK_SET);
read(fd, buf, read_size);
// Search backwards for newlines
for (int i = read_size - 1; i >= 0; i--) {
if (buf[i] == '\n') {
newline_count++;
if (newline_count == n) {
return chunk_start + i + 1; // Return start of Nth line
}
}
}
pos = chunk_start;
}
return 0;
}
lseek(fd, 0, SEEK_END) seeks to the end, then we read chunks and search backwards for newlines. Once we find the 10th newline, we output from that position to EOF.
Time complexity: O(file_size / buffer_size). Space complexity: O(buffer_size). Larger buffers mean fewer syscalls and better performance.
-f Follow Mode: inotify vs Polling#
The -f flag stands for “follow” - continuously track file changes. Two implementations exist:
Method 1: inotify (Linux-specific)#
Modern Linux uses inotify for file change notification:
#include <sys/inotify.h>
int inotify_fd = inotify_init();
int watch_fd = inotify_add_watch(inotify_fd, filename, IN_MODIFY);
while (1) {
char event_buf[4096];
read(inotify_fd, event_buf, sizeof(event_buf)); // Block until event
struct inotify_event *event = (struct inotify_event *)event_buf;
if (event->mask & IN_MODIFY) {
// File modified, read new content
read_new_content(fd);
}
}
inotify is kernel-level notification - instant events with zero CPU overhead.
Method 2: stat Polling (Cross-platform)#
Systems without inotify use stat polling:
while (1) {
struct stat st;
stat(filename, &st);
if (st.st_size > last_size) {
// File grew, read new content
read_from(fd, last_size, st.st_size - last_size);
last_size = st.st_size;
}
sleep(1); // Check every second
}
Polling trade-offs: too frequent = high CPU, too slow = delayed events. inotify is the right choice.
-F vs -f: Handling Log Rotation#
When logs rotate, the old file is renamed and a new one is created. tail -f keeps tracking the old file via inode - missing new logs.
tail -F (equivalent to --follow=name --retry) solves this:
- Track filename not inode: periodically check if path points to a new file
- Auto retry: if file doesn’t exist, retry opening every second
Implementation logic:
while (1) {
struct stat st;
if (stat(filename, &st) == 0) {
if (st.st_ino != last_inode) {
// inode changed, file was recreated
close(fd);
fd = open(filename, O_RDONLY);
last_inode = st.st_ino;
lseek(fd, 0, SEEK_SET); // Read from beginning
}
read_new_content(fd);
} else {
// File doesn't exist, wait and retry
sleep(1);
}
}
That’s why production environments prefer tail -F over tail -f.
Performance: Memory Mapping#
For large files, mmap beats read+lseek:
void *map = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
// Search backwards in memory-mapped region
for (off_t i = file_size - 1; i >= 0; i--) {
if (((char *)map)[i] == '\n') {
newline_count++;
if (newline_count == n) break;
}
}
munmap(map, file_size);
mmap maps the entire file once, letting you operate on memory directly in userspace, avoiding syscall overhead.
Practical Techniques#
1. Last 100 lines + real-time follow#
tail -n 100 -f app.log
Show last 100 lines first, then follow in real-time. Very useful for debugging.
2. Monitor multiple logs#
tail -f access.log error.log
# Output includes filename prefix
==> access.log <==
192.168.1.1 - GET /api/user
==> error.log <==
[ERROR] Connection timeout
3. Filter with grep#
tail -f app.log | grep ERROR
Only show lines containing ERROR. Note: this adds buffering, causing delays. Fix with --line-buffered:
tail -f app.log | grep --line-buffered ERROR
4. Count recent traffic#
tail -n 1000 access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
Top 10 IPs based on the last 1000 requests.
Web Implementation: Browser-side tail#
A simple tail using File API:
async function tail(file: File, lines: number = 10): Promise<string[]> {
const CHUNK_SIZE = 4096
const result: string[] = []
let pos = file.size
let buffer = ''
while (pos > 0 && result.length < lines) {
const chunk = file.slice(Math.max(0, pos - CHUNK_SIZE), pos)
const text = await chunk.text()
buffer = text + buffer
const parts = buffer.split('\n')
while (parts.length > 1 && result.length < lines) {
result.unshift(parts.pop()!)
}
buffer = parts[0]
pos -= CHUNK_SIZE
}
return result.slice(-lines)
}
File.slice reads chunks, searching backwards for newlines. Same core logic as the C version.
Common Pitfalls#
1. Binary Files#
tail splits by newlines. Binary files may not have newlines, causing tail to output the entire file.
2. Encoding Issues#
UTF-8 multi-byte characters can be split mid-character, causing mojibake. Always split at newlines, not arbitrary byte positions.
3. Large File Performance#
For GB-sized files, tail -n 1 reads the entire file to find the last newline. tail -c 1M (last 1MB of bytes) performs better.
Summary#
tail seems simple but involves:
- Reverse search algorithm (newline location)
- File change monitoring (inotify vs polling)
- Log rotation handling (inode detection)
- Performance optimization (mmap, buffering)
Next time you use tail -f to track logs, you’ll know what’s happening under the hood.
Related: Linux head Command | Linux less Pager