Linux file Command: Deep Dive into File Type Detection Magic#

Published: May 13, 2026, 06:19

When you encounter a file without an extension, or need to verify if a file has been tampered with, the Linux file command is your most reliable companion. It doesn’t rely on filename extensions—instead, it reads the file’s “DNA,” the magic number, to identify the true type.

Magic Numbers: The Real ID Cards of File Types#

File extensions are just labels for humans. The real type information hides in the file’s header as magic numbers—fixed byte sequences that serve as unique signatures for each file format.

PNG files:  89 50 4E 47 0D 0A 1A 0A
ZIP files:  50 4B 03 04
PDF files:  25 50 44 46
ELF executables: 7F 45 4C 46

The core principle of the file command is reading the file’s header bytes and matching them against the magic number database at /usr/share/misc/magic.mgc. This is why renaming image.png to document.txt doesn’t fool file—it still correctly identifies it as a PNG image.

Basic Usage: From Simple to Advanced#

Identifying a Single File#

$ file document.pdf
document.pdf: PDF document, version 1.7

$ file unknown.dat
unknown.dat: PNG image data, 1920 x 1080, 8-bit/color RGBA, non-interlaced

Even with a .dat extension, file accurately identifies this as a PNG image.

Batch Identifying Directory Files#

$ file *
index.html:   HTML document, UTF-8 Unicode text
app.js:       Node.js script, UTF-8 Unicode text executable
config.json:  JSON data
image.png:    PNG image data, 800 x 600, 8-bit/color RGB, non-interlaced
archive.zip:  Zip archive data, at least v2.0 to extract

Getting MIME Types#

In web development, MIME types are more useful than human-readable descriptions:

$ file -i image.png
image.png: image/png

$ file -i script.js
script.js: text/javascript; charset=utf-8

$ file -i data.tar.gz
data.tar.gz: application/gzip

This is invaluable for file upload validation, Content-Type headers, and similar scenarios.

Advanced Options: Looking Deeper#

Inspecting Compressed Files#

$ file -z backup.tar.gz
backup.tar.gz: POSIX tar archive (GNU) (gzip compressed data, deflated, original size 10485760)

Without -z, file only sees the outer gzip compression. With -z, it peers through to identify the inner content.

$ file link-to-config
link-to-config: symbolic link to /etc/nginx/nginx.conf

$ file -L link-to-config
link-to-config: ASCII text

The -L option makes file resolve the symbolic link and report the actual file type instead of just saying it’s a link.

Processing File Lists#

$ cat files.txt
/var/log/syslog
/etc/passwd
/home/user/.bashrc

$ file -f files.txt
/var/log/syslog:     UTF-8 Unicode text
/etc/passwd:         ASCII text
/home/user/.bashrc:  ASCII text executable

Reading Special Files#

$ file -s /dev/sda1
/dev/sda1: Linux rev 1.0 ext4 filesystem data, UUID=xxx (needs journal recovery) (extents) (large files) (huge files)

For block device files, regular file skips them. With -s, it reads the content to identify the filesystem type.

Real-World Use Cases#

1. Security Auditing: Detecting Disguised Files#

Attackers often disguise malicious scripts as image uploads:

$ file "avatar.jpg"
avatar.jpg: PHP script, UTF-8 Unicode text

Busted! Despite the .jpg extension, this is actually a PHP script. Adding file -i validation before processing file uploads is a basic security line of defense.

2. Cleaning Unknown Files#

# Find all actual image files
find . -type f -exec file -i {} \; | grep "image/" | cut -d: -f1

# Delete all non-text files
find . -type f -exec sh -c 'file -i "$1" | grep -qv "text/" && rm "$1"' _ {} \;

3. Batch Renaming#

When you have files without extensions that need organizing:

for f in *; do
  ext=$(file -b --mime-type "$f" | cut -d/ -f2)
  # Map common MIME types to extensions
  case $ext in
    jpeg) ext=jpg ;;
    plain) ext=txt ;;
    javascript) ext=js ;;
  esac
  mv "$f" "$f.$ext"
done

Performance: Large Files and Batch Processing#

The file command is highly efficient because it only reads the first few KB of a file, never loading the entire thing. But when processing tens of thousands of files, there’s still room for optimization:

# Slow: Launches one file process per file
find . -type f -exec file {} \;

# Fast: Passes files to a single file process
find . -type f -print0 | xargs -0 file

The second approach reduces process creation overhead and can be over 10x faster.

Custom Magic Number Database#

The file command’s recognition capability comes from the magic database at /usr/share/misc/magic.mgc. You can compile custom rules:

# Custom magic rule (save as custom.magic)
0 string MYFORMAT My custom format data
>8 uint32 x version %d

# Compile and use
file -C -m custom.magic
file -m custom.magic.mgc data.bin

This is useful for identifying proprietary internal file formats.

Common Pitfalls#

UTF-8 BOM Misidentification: UTF-8 files with BOM are reported as UTF-8 Unicode (with BOM) text, which some programs may mishandle.

Empty Files Can’t Be Identified: Zero-byte files only report empty with no type information.

Container Format Transparency: For container formats like MP4 and DOCX, file only reports the container type, not the internal encoding.

Summary#

The file command may be small, but it’s an indispensable part of the Linux toolkit. It identifies true file types through magic numbers, immune to extension spoofing, making it essential for security audits, file processing, and batch operations. Remember these key options: -i (MIME type), -z (peek into compressed files), -L (follow links), and you’ll handle most file identification scenarios with ease.

Next time you encounter a mysterious file, don’t guess—let file reveal the truth.