Root cause analysis using shell history timestamping

By November 18, 2008Technical

Some of the problem-solving we do has a forensic component (not in the legal sense). When something breaks down, we perform what’s known as Root Cause Analysis. It’s well and good to fix the symptoms so things are running again, but it’s more important to fix the problem so it doesn’t happen again.

Often enough, we’ll login to a user’s account and check out what they’ve run recently. They might have edited a config file, removed some necessary wrapper scripts (this happens on our shared hosting servers), or blindly duplicated something that really shouldn’t be copied. Whatever the case, these are good places to start looking when you need to fix something.

To check out the history, you can run the helpful `history` command. This is a small snippet, it’s not unreasonable to have a thousand entries or more.

hiroyuki@piffle:~% history
   60  mutt
   61  cd
   62  ssh shirayuki
   63  telnet sdf.lonestar.org
   64  cal -3
   65  cd ~/svn/puppet

Occasionally it’d be really helpful to know when a command was run. Did they delete any files around the time the problem started happening? When did they download an update for their CMS? Unfortunately, we usually don’t have an answer.

Thankfully this is easy to rectify. bash, the default shell on most systems, has the ability to record timestamps. If you’re one of those hip cats that use zsh, you’re also in luck.

For bash, it’s a one-liner addition to your ~/.bash_profile file

export HISTTIMEFORMAT='%Y-%m-%d %H:%M:%S  '

For zsh it’s a little bit more, and can go in your ~/.zshrc

HISTFILE=~/.zsh_history

HISTSIZE=9999

SAVEHIST=9999

setopt extendedhistory

Once you’ve set this up, you’ll need to re-open your shell so it reads in the changes. Go ahead and run a few commands. You might want to wait a couple of minutes so you can get distinct timestamps.

If you’re using bash, the history command will now print your timestamps.

misaki@piffle:~% history
   60  2008-11-18 08:38:24  mutt
   61  2008-11-18 08:39:46  cd
   62  2008-11-18 08:39:47  ssh shirayuki
   63  2008-11-18 08:43:13  telnet sdf.lonestar.org
   64  2008-11-18 08:45:06  cal -3
   65  2008-11-18 08:45:20  cd ~/svn/puppet

With zsh, you use `history -i` to display the timestamps.

charlotte@yoshino:~$ history -i
  116  2008-11-17 23:11:56  mv public_html/index.wsgi kita_he/
  117  2008-11-17 23:11:59  ls
  118  2008-11-17 23:12:07  cd kita_he/
  119  2008-11-17 23:12:07  ls
  120  2008-11-17 23:12:12  vim index.wsgi
  121  2008-11-17 23:16:32  psql

When you first enable timestamping you’ll notice it’s retroactively applied timestamps to every existing entry. While it’s not meaningful, you can’t do much about it, it needs something there for consistency and it’s unlikely to confuse anyone.

We can solve problems

Using forensic science!

Timestamps are helpful.