So you’ve mastered the basics from Part 1 — you can nice your processes and run htop like a pro. But what happens when your system starts acting weird and basic monitoring isn’t enough? Welcome to the advanced course, where we turn you into a digital Sherlock Holmes who can solve any performance mystery.
Imagine your computer just had a performance hiccup at 2 AM, and now it’s 9 AM and your boss is asking why the server was slow. Basic tools show you what’s happening now, but you need to know what happened then. Time to bring out the big guns!
Why Should You Care About Advanced Monitoring?
Because being a Linux user without these tools is like being a doctor without a stethoscope. You might see the obvious problems (patient is unconscious), but you’ll miss the subtle ones (irregular heartbeat). Advanced monitoring helps you catch issues before they become disasters, optimize performance like a race car mechanic, and most importantly — look like an absolute wizard when you fix problems others can’t even diagnose.
Performance Metrics: Your System’s Health Checkup
The Sysstat Package — Your Swiss Army Knife
Meet mpstat
and pidstat
- they're like having a doctor for your computer that can check specific organs (CPU cores, processes) instead of just taking your temperature. These tools are part of the sysstat package, which is basically a medical toolkit for your system.
MPSTAT — The CPU Specialist
This tool breaks down what each CPU core is doing, like having individual heart monitors for each chamber of your heart. While top
shows you overall CPU usage, mpstat
tells you if core #3 is having a bad day while the others are chilling.
The Magic Syntax:
bash
mpstat [-P {cpu | ALL}] [interval] [count]
Real-world detective work:
# Check all CPU cores every 2 seconds, 5 times in a row
mpstat -P ALL 2 5
# Monitor just CPU 0 every second for 10 iterations
mpstat -P 0 1 10
What you’ll discover:
- User mode: Time spent running your programs (the actual work)
- System mode: Time spent on housekeeping (OS overhead)
- Idle: Time spent twiddling thumbs (the good kind of waiting)
- I/O wait: Time spent waiting for slow storage (the bad kind of waiting)
PIDSTAT — The Process Detective
While mpstat
looks at CPUs, pidstat
stalks individual processes like a very polite private investigator. It's perfect for answering questions like "Which process is eating all my memory?" or "What's hammering my disk at 3 AM?"
The Detective Toolkit:
pidstat [-u] [-r] [-d] [-p pid] [interval] [count]
-u
: CPU usage (who's hogging the processor?)-r
: Memory stats (who's the memory monster?)-d
: I/O activity (who's hammering the disk?)
Example investigation:
# Watch Firefox's resource usage every 2 seconds continously
pidstat -u -r -d -p $(pgrep firefox) 2
# Find the top I/O consumers every 5 seconds
pidstat -d 5
Diagnostic and Debugging Tools: When Things Go Really Wrong
/proc/<pid> — The Process’s Personal Diary
Every running process has a folder in /proc/
that's like reading someone's diary - but legally! This virtual filesystem contains everything you could ever want to know about a process.
The juicy details:
# How was this process started?
cat /proc/1234/cmdline
# What environment variables does it have?
cat /proc/1234/environ
# What's its current memory footprint?
cat /proc/1234/status | grep -i mem
# What files does it have open?
ls -l /proc/1234/fd/
Pro tip: Replace 1234
with any actual process ID from ps
or htop
!
PSTREE — The Family Tree Detective
Ever wonder which process is the parent of that mysterious background task? pstree
draws you a beautiful family tree showing who spawned whom. It's perfect for tracking down runaway processes or understanding complex service hierarchies.
Family drama investigation:
# Show the full family tree with process IDs
pstree -p
# Focus on a specific user's processes
pstree username
# Show just the children of a specific process
pstree -p 1234
LSOF — The “Who’s Using What” Detective
In Linux, everything is a file (network sockets, pipes, actual files, devices). lsof
(List Open Files) tells you which process is using which file - it's like having X-ray vision for your system.
Practical magic tricks:
# See what's using port 443 (HTTPS)
sudo lsof -i TCP:443 -s TCP:LISTEN
# Find which process is using a file
lsof /path/to/important/file
# See all network connections
sudo lsof -i
# Find all files opened by Firefox
lsof -c firefox
# To find all services using TCP with port 80 for finding port conflicts
lost -iTCP:443 -sTCP:443
Real-world scenarios:
- Can’t unmount a USB drive?
lsof /media/usb
will show you what's still using it - Suspicious network activity?
lsof -i
reveals all network connections - Port conflict?
lsof -i :8080
shows what's already using port 8080
STRACE — The Process Wiretap
When a program misbehaves and you need to know exactly what it’s doing, strace
is like putting a wire on it. It logs every system call (read, write, open, connect, etc.) so you can see exactly where things go wrong.
The wiretap syntax:
# Monitor all system calls of a running process
strace -p 1234
# Start a program under surveillance
strace ./my_suspicious_program
# Focus on specific types of system calls
strace -e trace=open,read,write my_program
# See only network-related calls
strace -e trace=network curl google.com
Detective scenarios:
# Debug a "Permission denied" error
strace -e trace=open ./failing_program
# See what files a program tries to access
strace -e trace=file ls /home
# Monitor a web service's network activity
strace -e trace=network -p $(pgrep apache2)
Pro tip: strace
output can be overwhelming. Use -e trace=
to filter, or pipe to grep
to find specific patterns!
Advanced Investigation Techniques
The Performance Mystery Solving Process
When your system acts weird, follow this detective methodology:
- Start broad with
htop
oratop
- what's the overall situation? - Get specific with
mpstat -P ALL 1 5
- which CPU cores are struggling? - Find the culprit with
pidstat -u -r -d 2
- which processes are misbehaving? - Go deep with
strace -p <suspicious_pid>
- what is the bad process actually doing? - Check resources with
lsof -p <pid>
- what files/ports is it using?
The “System Was Slow Yesterday” Investigation
This is where atop
shines. Unlike other tools that only show current data, atop
keeps historical snapshots:
# View system performance from yesterday at 3 PM
atop -r /var/log/atop/atop_20240817 -b 15:00 -e 15:30
The “What’s Eating My Bandwidth?” Hunt
Combine tools for network detective work:
bash
# Find processes using network
sudo lsof -i
# Monitor network system calls
sudo strace -e trace=network -p $(pgrep -f "suspicious_app")
# Check what ports are listening
sudo lsof -i -s TCP:LISTEN
TLDR — The Advanced Cheat Sheet
Performance Deep Dive:
mpstat -P ALL 2 5
- CPU breakdown per core over timepidstat -u -r -d 2
- Per-process CPU, memory, and I/O statsatop -r <logfile>
- Historical performance data
Process Investigation:
/proc/<pid>/status
- Detailed process infopstree -p
- Process family tree with IDslsof -p <pid>
- Files and ports used by processstrace -p <pid>
- Real-time system call monitoring
Network & File Debugging:
sudo lsof -i :port
- What's using a specific portlsof <filename>
- What's using a specific filestrace -e trace=network <command>
- Network activity monitoring
The Golden Rule: Start broad, get specific, go deep. Every performance mystery has clues — you just need to know where to look!