Root cause analysis using shell history timestamping
Some of the problem-solving we do has a forensic component (not in the legal sense). When something breaks down, we perform what’s known as Root Cause Analysis. It’s well and good to fix the symptoms so things are running again, but it’s more important to fix the problem so it doesn’t happen again.
Often enough, we’ll login to a user’s account and check out what they’ve run recently. They might have edited a config file, removed some necessary wrapper scripts (this happens on our shared hosting servers), or blindly duplicated something that really shouldn’t be copied. Whatever the case, these are good places to start looking when you need to fix something.
To check out the history, you can run the helpful `history` command. This is a small snippet, it’s not unreasonable to have a thousand entries or more.
[email protected]:~% history 60 mutt 61 cd 62 ssh shirayuki 63 telnet sdf.lonestar.org 64 cal -3 65 cd ~/svn/puppet
Occasionally it’d be really helpful to know when a command was run. Did they delete any files around the time the problem started happening? When did they download an update for their CMS? Unfortunately, we usually don’t have an answer.
Thankfully this is easy to rectify. bash, the default shell on most systems, has the ability to record timestamps. If you’re one of those hip cats that use zsh, you’re also in luck.
For bash, it’s a one-liner addition to your ~/.bash_profile file
export HISTTIMEFORMAT='%Y-%m-%d %H:%M:%S '
For zsh it’s a little bit more, and can go in your ~/.zshrc
HISTFILE=~/.zsh_history HISTSIZE=9999 SAVEHIST=9999 setopt extendedhistory
Once you’ve set this up, you’ll need to re-open your shell so it reads in the changes. Go ahead and run a few commands. You might want to wait a couple of minutes so you can get distinct timestamps.
If you’re using bash, the history command will now print your timestamps.
[email protected]:~% history 60 2008-11-18 08:38:24 mutt 61 2008-11-18 08:39:46 cd 62 2008-11-18 08:39:47 ssh shirayuki 63 2008-11-18 08:43:13 telnet sdf.lonestar.org 64 2008-11-18 08:45:06 cal -3 65 2008-11-18 08:45:20 cd ~/svn/puppet
With zsh, you use `history -i` to display the timestamps.
[email protected]:~$ history -i 116 2008-11-17 23:11:56 mv public_html/index.wsgi kita_he/ 117 2008-11-17 23:11:59 ls 118 2008-11-17 23:12:07 cd kita_he/ 119 2008-11-17 23:12:07 ls 120 2008-11-17 23:12:12 vim index.wsgi 121 2008-11-17 23:16:32 psql
When you first enable timestamping you’ll notice it’s retroactively applied timestamps to every existing entry. While it’s not meaningful, you can’t do much about it, it needs something there for consistency and it’s unlikely to confuse anyone.
We can solve problems
Using forensic science!
Timestamps are helpful.