So you’ve mastered the basics from Part 1 — you can nice your processes and run htop like a pro. But what happens when your system starts acting weird and basic monitoring isn’t enough? Welcome to the advanced course, where we turn you into a digital Sherlock Holmes who can solve any performance mystery.
Imagine your computer just had a performance hiccup at 2 AM, and now it’s 9 AM and your boss is asking why the server was slow. Basic tools show you what’s happening now, but you need to know what happened then. Time to bring out the big guns!
Why Should You Care About Advanced Monitoring?
Because being a Linux user without these tools is like being a doctor without a stethoscope. You might see the obvious problems (patient is unconscious), but you’ll miss the subtle ones (irregular heartbeat). Advanced monitoring helps you catch issues before they become disasters, optimize performance like a race car mechanic, and most importantly — look like an absolute wizard when you fix problems others can’t even diagnose.
Performance Metrics: Your System’s Health Checkup
The Sysstat Package — Your Swiss Army Knife
Meet mpstat and pidstat - they're like having a doctor for your computer that can check specific organs (CPU cores, processes) instead of just taking your temperature. These tools are part of the sysstat package, which is basically a medical toolkit for your system.
MPSTAT — The CPU Specialist
This tool breaks down what each CPU core is doing, like having individual heart monitors for each chamber of your heart. While top shows you overall CPU usage, mpstat tells you if core #3 is having a bad day while the others are chilling.
The Magic Syntax:
bash
mpstat [-P {cpu | ALL}] [interval] [count]Real-world detective work:
# Check all CPU cores every 2 seconds, 5 times in a row
mpstat -P ALL 2 5# Monitor just CPU 0 every second for 10 iterations
mpstat -P 0 1 10What you’ll discover:
- User mode: Time spent running your programs (the actual work)
- System mode: Time spent on housekeeping (OS overhead)
- Idle: Time spent twiddling thumbs (the good kind of waiting)
- I/O wait: Time spent waiting for slow storage (the bad kind of waiting)
PIDSTAT — The Process Detective
While mpstat looks at CPUs, pidstat stalks individual processes like a very polite private investigator. It's perfect for answering questions like "Which process is eating all my memory?" or "What's hammering my disk at 3 AM?"
The Detective Toolkit:
pidstat [-u] [-r] [-d] [-p pid] [interval] [count]-u: CPU usage (who's hogging the processor?)-r: Memory stats (who's the memory monster?)-d: I/O activity (who's hammering the disk?)
Example investigation:
# Watch Firefox's resource usage every 2 seconds continously
pidstat -u -r -d -p $(pgrep firefox) 2# Find the top I/O consumers every 5 seconds
pidstat -d 5Diagnostic and Debugging Tools: When Things Go Really Wrong
/proc/<pid> — The Process’s Personal Diary
Every running process has a folder in /proc/ that's like reading someone's diary - but legally! This virtual filesystem contains everything you could ever want to know about a process.
The juicy details:
# How was this process started?
cat /proc/1234/cmdline# What environment variables does it have?
cat /proc/1234/environ
# What's its current memory footprint?
cat /proc/1234/status | grep -i mem
# What files does it have open?
ls -l /proc/1234/fd/Pro tip: Replace 1234 with any actual process ID from ps or htop!
PSTREE — The Family Tree Detective
Ever wonder which process is the parent of that mysterious background task? pstree draws you a beautiful family tree showing who spawned whom. It's perfect for tracking down runaway processes or understanding complex service hierarchies.
Family drama investigation:
# Show the full family tree with process IDs
pstree -p# Focus on a specific user's processes
pstree username
# Show just the children of a specific process
pstree -p 1234LSOF — The “Who’s Using What” Detective
In Linux, everything is a file (network sockets, pipes, actual files, devices). lsof (List Open Files) tells you which process is using which file - it's like having X-ray vision for your system.
Practical magic tricks:
# See what's using port 443 (HTTPS)
sudo lsof -i TCP:443 -s TCP:LISTEN# Find which process is using a file
lsof /path/to/important/file
# See all network connections
sudo lsof -i
# Find all files opened by Firefox
lsof -c firefox
# To find all services using TCP with port 80 for finding port conflicts
lost -iTCP:443 -sTCP:443Real-world scenarios:
- Can’t unmount a USB drive?
lsof /media/usbwill show you what's still using it - Suspicious network activity?
lsof -ireveals all network connections - Port conflict?
lsof -i :8080shows what's already using port 8080
STRACE — The Process Wiretap
When a program misbehaves and you need to know exactly what it’s doing, strace is like putting a wire on it. It logs every system call (read, write, open, connect, etc.) so you can see exactly where things go wrong.
The wiretap syntax:
# Monitor all system calls of a running process
strace -p 1234# Start a program under surveillance
strace ./my_suspicious_program
# Focus on specific types of system calls
strace -e trace=open,read,write my_program
# See only network-related calls
strace -e trace=network curl google.comDetective scenarios:
# Debug a "Permission denied" error
strace -e trace=open ./failing_program# See what files a program tries to access
strace -e trace=file ls /home
# Monitor a web service's network activity
strace -e trace=network -p $(pgrep apache2)Pro tip: strace output can be overwhelming. Use -e trace= to filter, or pipe to grep to find specific patterns!
Advanced Investigation Techniques
The Performance Mystery Solving Process
When your system acts weird, follow this detective methodology:
- Start broad with
htoporatop- what's the overall situation? - Get specific with
mpstat -P ALL 1 5- which CPU cores are struggling? - Find the culprit with
pidstat -u -r -d 2- which processes are misbehaving? - Go deep with
strace -p <suspicious_pid>- what is the bad process actually doing? - Check resources with
lsof -p <pid>- what files/ports is it using?
The “System Was Slow Yesterday” Investigation
This is where atop shines. Unlike other tools that only show current data, atop keeps historical snapshots:
# View system performance from yesterday at 3 PM
atop -r /var/log/atop/atop_20240817 -b 15:00 -e 15:30The “What’s Eating My Bandwidth?” Hunt
Combine tools for network detective work:
bash
# Find processes using network
sudo lsof -i# Monitor network system calls
sudo strace -e trace=network -p $(pgrep -f "suspicious_app")
# Check what ports are listening
sudo lsof -i -s TCP:LISTENTLDR — The Advanced Cheat Sheet
Performance Deep Dive:
mpstat -P ALL 2 5- CPU breakdown per core over timepidstat -u -r -d 2- Per-process CPU, memory, and I/O statsatop -r <logfile>- Historical performance data
Process Investigation:
/proc/<pid>/status- Detailed process infopstree -p- Process family tree with IDslsof -p <pid>- Files and ports used by processstrace -p <pid>- Real-time system call monitoring
Network & File Debugging:
sudo lsof -i :port- What's using a specific portlsof <filename>- What's using a specific filestrace -e trace=network <command>- Network activity monitoring
The Golden Rule: Start broad, get specific, go deep. Every performance mystery has clues — you just need to know where to look!