Tracing I/O usage on Linux

February 19, 2009 Technical, General

I/O subsystems are a whole industry of their own, and many libraries could be (and probably have been) written on the subject already. The particular sub-topic I’m talking about today is when you are faced with a machine that you suspect to be suffering from heavy I/O load, and you want to find the culprit.

Sadly, this is an area that Windows has the upper hand. You can quite easily using the Performance Monitor determine which process is using the largest chunk of your disk I/O. On Linux things can be a little harder, however not all is lost.

If you are fortunate enough to be experiencing the problem on a machine running at least a 2.6.20 kernel and with Python 2.5 or later available, you can run IOTop. This prints out I/O usage data in a similar format to the standard “top” command, and it looks something like this:

IOTop Output

IOTop Output - picture sourced from http://guichaz.free.fr/iotop/iotop_big.png

Sadly at the time I needed to diagnose I/O, the machine I was using had neither a 2.6.20 kernel nor Python 2.5 so I was forced into seeking other methods to trace the I/O. Cue Blktrace. This hooks into the kernel’s debug filesystem to gather I/O stats and presents a fairly raw trace of what’s going on. You can download the source from here or find RPM packages for recent RHEL at the RPMForge Repository.

While it is possible to use blktrace directly, there is also a helper script btrace which shortcuts a lot of the most commonly used options and output formatting. You will need to mount debugfs on /sys/kernel/debug then you are ready to roll!

[email protected]:~# mount -t debugfs none /sys/kernel/debug
[email protected]:~# btrace /dev/sda
  8,0    0        1     0.000000000  2884  A   W 60060711 + 8 <- (8,1) 60060648
  8,0    0        2     0.000000244  2884  Q   W 60060711 + 8 [kjournald]
  8,0    0        3     0.000005278  2884  G   W 60060711 + 8 [kjournald]
  8,0    0        4     0.000006933  2884  P   N [kjournald]
  8,0    0        5     0.000007515  2884  I   W 60060711 + 8 [kjournald]
  8,0    0        6     0.000010068  2884  A   W 60093063 + 8 <- (8,1) 60093000
  8,0    0        7     0.000010263  2884  Q   W 60093063 + 8 [kjournald]
  8,0    0        8     0.000011588  2884  G   W 60093063 + 8 [kjournald]
  8,0    0        9     0.000012072  2884  I   W 60093063 + 8 [kjournald]

OK so there’s not much I/O happening on my workstation, but on the machine I was diagnosing recently, the output of btrace spewed out hundreds of lines per second, many of them referring to a process running the mutt mail program. It turned out one of the users had approximately 90,000 emails in one folder that mutt was constantly rescanning since the machine didn’t have a recent enough version of mutt to support header caching.

The emails were archived away and the I/O problem was resolved. Whereas previously we could only guess at what was causing the I/O load, blktrace squarely points the finger at the problem process. On later machines IOTop would have been even more straightforward. Both are valuable additions to the sysadmin toolkit.