Nagios notification options
As we wrote in our article about advanced use of Nagios for system monitoring, we use a number of scripts and tools to help us receive and respond to notifications in a timely and efficient manner. These methods are already in use across our dedicated server and virtual private server products. Here, we'll share some of methods.
Contents
SMS (Short Messaging Service)
Belying its age, Nagios has a "pager" field in its contact definitions. Chances are you aren't using a real pager, but it's ideal for a mobile phone number. Seeing as pretty much everything is defined as an arbitrary command string, you can make use of this in whatever way is most convenient. We use an SMS gateway provider, specifically Internode's NodeText gateway.
SMS gateways
SMS web-gateway providers will typically offer a form-based service that allows for easy scripting. Internode uses a standard HTTP form that can accept parameters via GET or POST. As an example, you can use curl to do this:
curl -d 'user=username@txt.internode.on.net' -d 'pass=something_secret' -d 'dest=61412456789' -d 'msg=this is the message' -d 'timezone=Australia/Sydney' https://txt.on.net/cgi-bin/sms_tcp.cgi
GSM modem
If you send a lot of notifications, a cheaper alternative may be to invest in a standalone GSM modem. The modem is effectively a computer-controlled mobile phone, which can be used to send SMS messages. The hardware will set you back a few hundred dollars (one suitable model would be the Intercel SAM2W), and a SIM is practically free nowadays. A number of telcos (eg. Virgin and Vodafone) offer cheap plans with lots of included SMS per month. This makes for a cost-effective solution, and has the very nice advantage of working in the face of catastrophic network failure.
Integrating with Nagios
The default notification in Nagios is notify-by-email, and lets you drop in numerous details about the notification. Nagios performs its own keyword replacement before executing the result in a shell.
define command { command_name notify-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ }We'll add another command to handle sending an SMS
define command{ command_name notify-by-sms command_line /usr/local/sbin/send_sms --number $CONTACTPAGER$ --message "$NOTIFICATIONTYPE$ - $SERVICEDESC$ on $HOSTNAME$ is $SERVICESTATE$ : $SERVICEACKCOMMENT$ : $SERVICEOUTPUT$" }Of course we need the actual send_sms script to do the job for us. We could call curl directly from the notify-by-sms command, but it's a bit messy, and isn't as flexible if we want to customise the message a bit. We use a simple perl script to add a timestamp and perform some other cleanup.
send_sms
1 #!/usr/bin/perl 2 3 use strict; 4 use warnings; 5 6 # You might need to install extra packages to use these modules 7 # CPAN is one option, though OS packages might be more convenient 8 # Under Debian, you'll need at least libwww-perl and libcrypt-ssleay-perl 9 use LWP::UserAgent; 10 use Crypt::SSLeay; 11 use Getopt::Long; 12 13 sub get_date { 14 (my $sec, my $min, my $hour, my $mday, my $mon, my $year, my $wday, my $yday, my $isdst) = localtime(time()); 15 return sprintf("%02d-%02d %02d:%02d", $mon + 1, $mday, $hour, $min); 16 } 17 18 sub usage { 19 print "--message (-m) 'Your message'\n"; 20 print "--number (-n) 614XXXXXXXX\n"; 21 print "--test (-t) 0|1\n\n"; 22 print "The 'test' flag is optional, Internode will send an SMS by default.\n"; 23 } 24 25 sub send_sms($$$) { 26 my $number = shift; 27 my $message = shift; 28 my $test = shift; 29 30 my $ua = LWP::UserAgent->new(env_proxy => 1, 31 keep_alive => 1, 32 timeout => 20, 33 ); 34 35 my $url = 'https://txt.on.net/cgi-bin/sms_tcp.cgi'; 36 37 38 my $resp = $ua->post($url, 39 [ user => 'username@txt.internode.on.net', 40 pass => 'your_password', 41 dest => $number, 42 msg => "$message", 43 timezone => "Australia\/Sydney", 44 test => $test, 45 ], 46 ); 47 48 49 # Response code and diagnostic output is passed as a single string output from the page 50 my($retval, $response) = $resp->content =~ m/Status: (\d\d) (.*)/; 51 52 print "$response\n"; 53 54 return $retval; 55 } 56 57 my $number; 58 my $message; 59 my $test = 0; 60 if (!GetOptions('number=s' => \$number, 61 'message=s' => \$message, 62 'test:i' => \$test, 63 'help' => sub { usage(); exit 0; } 64 )) { 65 exit 1; 66 } 67 68 if (!defined($message)) { 69 print "You need to specify a message to send\n"; 70 exit 1; 71 } 72 73 # Add date in to the message so we know when it was sent (it may get delayed along the way) 74 $message = get_date() . " " . $message; 75 76 # Truncate the message to 160 characters, we don't want to span two SMSes 77 $message = substr($message, 0, 160); 78 79 # Translate the leading zero in the mobile number to 61 (international-style, but without leading plus-sign) 80 $number =~ s/^0/61/; 81 82 # Send the message 83 my $r = send_sms($number, $message, $test); 84 85 exit($r);
Now we can go ahead and use this in a contact definition. Assuming you already have notification-capable services defined, this will "just work".
define contact { contact_name tomoya alias Tomoya Okazaki service_notification_period 24x7 service_notification_options w,u,c,r service_notification_commands notify-by-sms host_notification_period 24x7 host_notification_options d,u,r host_notification_commands host-notify-by-sms pager 0411999999 email tomoya.okazaki@clannad.or.jp }
Instant Messaging
SMS is great for getting attention, but it's overkill when you're at work and can be contacted in some other way. We already have a local Jabber server in the office with a chat session, so the natural alternative was to setup a robot that can send us messages. As well as being cheaper than sending an SMS, it integrates well with our workflow. It's easy to ignore if you're working on something else, but sends messages where you expect to see them when you want them.
Components
As an additional feature, we decided that Nagios should attempt to send an instant message if you're at your desk, otherwise you'll get an SMS. To do this, the Jabber integration needs a couple of components.
Presence-sensing robot
The robot presents itself as a "Buddy" so it can see your away-status. We usually set our Jabber client to report us as being Away if we're idle for more than five minutes. The robot is a perl script that polls for information from the server about who's signed in and what their status is.
The jabber robot runs independently and simply updates a set of files according to who's Online according to the server. How you start this is up to you. We use our configuration management system to ensure the process is running, but an initscript may be more appropriate for other circumstances.
jabber_robot
1 #! /usr/bin/perl -w 2 3 use Net::XMPP; 4 use Data::Dumper; 5 use strict; 6 7 use constant SERVER => 'jabber.local'; 8 use constant PORT => 5222; 9 use constant USERNAME => 'nagiosbot'; 10 use constant PASSWORD => 'some_password'; 11 use constant RESOURCE => 'bot'; 12 use constant NOTIFY_DIR => '/var/lib/nagios-buddies-available'; 13 14 $SIG{HUP} = \&Stop; 15 $SIG{KILL} = \&Stop; 16 $SIG{TERM} = \&Stop; 17 $SIG{INT} = \&Stop; 18 19 my $Con; 20 21 # Check paths. 22 if (! -d NOTIFY_DIR) { 23 mkdir NOTIFY_DIR, 0700 or 24 die "ERROR: Unable to mkdir '" . NOTIFY_DIR . "'\n"; 25 } 26 chdir NOTIFY_DIR or die "ERROR: Unable to chdir to '" . NOTIFY_DIR . "'\n"; 27 28 EventHandler(); 29 30 sub ConnectClient 31 { 32 $Con = new Net::XMPP::Client(); 33 34 $Con->RosterDB(); 35 $Con->PresenceDB(); 36 #$Con->SetPresenceCallBacks(available => \&InPresence, 37 # unavailable => \&InPresence); 38 39 my $status = $Con->Connect(hostname => SERVER, port => PORT); 40 if (!(defined($status))) 41 { 42 print STDERR "ERROR: Jabber server is down or connection was not allowed.\n"; 43 print STDERR " ($!)\n"; 44 exit(1); 45 } 46 47 my @result = $Con->AuthSend(username => USERNAME, 48 password => PASSWORD, 49 resource => RESOURCE); 50 if ($result[0] ne "ok") 51 { 52 print STDERR "ERROR: Authorisation failed: $result[0] - $result[1]\n"; 53 exit(2); 54 } 55 56 $Con->PresenceSend(); 57 58 return $Con; 59 } 60 61 sub EventHandler 62 { 63 while(1) { 64 ConnectClient(); 65 $Con->RosterRequest(); 66 67 while(defined($Con->Process(60))) { 68 69 foreach my $jid ($Con->RosterDBJIDs()) { 70 UpdateUserStatus($jid); 71 } 72 } 73 74 print STDERR "ERROR: The connection was killed...\n"; 75 } 76 } 77 78 sub Stop 79 { 80 #print "Exiting...\n"; 81 $Con->Disconnect(); 82 exit(0); 83 } 84 85 sub UpdateUserStatus 86 { 87 my ($jid) = @_; 88 89 # Get username. 90 my $username = $jid->GetUserID(); 91 $username =~ s:/,::g; 92 93 if (IsOnline($jid)) { 94 open(TMP, '>>', $username) or warn "ERROR: Unable to open file '$username'\n"; 95 close(TMP); 96 utime(undef, undef, $username) or warn "ERROR: Unable to update timestamp of '$username'\n"; 97 } elsif (-f $username) { 98 unlink($username) or warn "ERROR: Unable to delete presence status for '$username'\n"; 99 } 100 } 101 102 sub IsOnline 103 { 104 my ($jid) = @_; 105 my $Pres = $Con->PresenceDBQuery($jid); 106 107 if (!(defined($Pres))) { 108 return 0; 109 } 110 111 my $show = $Pres->GetShow(); 112 if ($show) { 113 # Idle, or away, or dnd, etc. 114 return 0; 115 } 116 117 return 1; 118 } 119 120 sub Availability 121 { 122 my ($jid) = @_; 123 my $Pres = $Con->PresenceDBQuery($jid); 124 125 if (!(defined($Pres))) { 126 return 'Unavailable'; 127 } 128 129 my $show = $Pres->GetShow(); 130 if ($show) { 131 # Idle, or away, or dnd, etc. 132 return "Away ($show)"; 133 } 134 135 return 'Available'; 136 } 137 138 sub InPresence 139 { 140 my ($sid, $Pres) = @_; 141 142 $Con->PresenceDBParse($Pres); 143 144 my $from = $Pres->GetFrom(); 145 my $type = $Pres->GetType(); 146 my $status = $Pres->GetStatus(); 147 my $show = $Pres->GetShow(); 148 print "===\n"; 149 print "Presence\n"; 150 print " From $from\n"; 151 print " Type: $type\n"; 152 print " Status: $status\n"; 153 print " Show: $show\n"; 154 print "===\n"; 155 print $Pres->GetXML(),"\n"; 156 print "===\n"; 157 } 158 159 sub InIQ 160 { 161 my $sid = shift; 162 my $iq = shift; 163 164 my $from = $iq->GetFrom(); 165 my $type = $iq->GetType(); 166 my $query = $iq->GetQuery(); 167 my $xmlns = $query->GetXMLNS(); 168 print "===\n"; 169 print "IQ\n"; 170 print " From $from\n"; 171 print " Type: $type\n"; 172 print " XMLNS: $xmlns"; 173 print "===\n"; 174 print $iq->GetXML(),"\n"; 175 print "===\n"; 176 } 177 178 sub InMessage 179 { 180 my $sid = shift; 181 my $message = shift; 182 183 my $type = $message->GetType(); 184 my $fromJID = $message->GetFrom("jid"); 185 186 my $from = $fromJID->GetUserID(); 187 my $resource = $fromJID->GetResource(); 188 my $subject = $message->GetSubject(); 189 my $body = $message->GetBody(); 190 print "===\n"; 191 print "Message ($type)\n"; 192 print " From: $from ($resource)\n"; 193 print " Subject: $subject\n"; 194 print " Body: $body\n"; 195 print "===\n"; 196 print $message->GetXML(),"\n"; 197 print "===\n"; 198 }
Message-sending script
The next step is a script that actually does the hard work; sending a message to a user in Jabber.
notify_via_jabber
1 #!/usr/bin/perl -w 2 # 3 # Author David Cox 4 # Created from various code examples found on the web 5 # Last Modified 08/06/2002 6 # Feel free to use or modify as needed to suit your needs 7 # 8 # Modified by Anchor 9 10 use strict; 11 use Net::Jabber qw(Client) ; 12 use Net::Jabber qw(Message) ; 13 use Net::Jabber qw(Protocol) ; 14 use Net::Jabber qw(Presence) ; 15 16 my $len = scalar @ARGV; 17 18 if ($len ne 2) { 19 die "Usage...\n notify [jabberid] [message]\n"; 20 } 21 22 use constant SERVER => 'jabber.local'; 23 use constant PORT => 5222; 24 use constant USER => 'nagiosbot'; 25 use constant PASSWORD => 'some_password'; 26 use constant RESOURCE => 'bot'; 27 use constant MESSAGE => $ARGV[1]; 28 ####################################################### 29 # MAXWAIT is used because the send message function didn't seem to 30 # like being called to fast. The message would be sent unless I waited a second 31 # or so. You can experiment with it but I just went with 2 seconds. 32 ####################################################### 33 use constant MAXWAIT => 2; 34 35 my $connection = Net::Jabber::Client->new(); 36 $connection->Connect( "hostname" => SERVER,"port" => PORT ) or die "Cannot connect ($!)\n"; 37 38 my @result = $connection->AuthSend( "username" => USER,"password" => PASSWORD,"resource" => RESOURCE ); 39 if ($result[0] ne "ok") { 40 die "Ident/Auth with server failed: $result[0] - $result[1]\n"; 41 } 42 $connection->PresenceSend(); 43 44 die "Uh-oh - something has gone wrong with the connection\n" 45 unless(defined($connection->Process(2))); 46 47 my $message = Net::Jabber::Message->new(); 48 $message->SetMessage( "to" => $recipient, 49 "subject" => "Notification", 50 "type" => "chat", 51 "body" => MESSAGE); 52 53 $connection->Send($message); 54 die "Uh-oh - something has gone wrong with the send\n" 55 unless(defined($connection->Process(2))); 56 sleep(MAXWAIT); 57 58 $connection->Disconnect(); 59 exit;
Pulling it together
Now to connect it all to Nagios. The script first needs to determine whether you're online or not. Then send the message by Jabber if you're online, or by SMS if you're not. Some of the paths are hard-coded and dependent on the previous scripts.
notify_via_jabber_or_sms
1 #!/bin/bash 2 # 3 # Send notifications via Jabber or SMS if person is not available. 4 5 NOTIFY_DIR=/var/lib/nagios-buddies-available 6 JABBER_SCRIPT=/usr/local/sbin/notify_via_jabber 7 SMS_SCRIPT=/usr/local/sbin/send_sms 8 9 help() { 10 echo "Usage: $0 jabber_id mobile_number message" 1>&2 11 exit 1 12 } 13 14 notify_via_jabber() { 15 local jabber_id=$1 16 local message=$2 17 18 local jabber_name=$(echo $jabber_id | cut -d@ -f1) 19 local status_file="$NOTIFY_DIR/$jabber_name" 20 21 TMPFILE=`mktemp -t timestamp.XXXXXX` || return 1 22 touch -d 'now - 2 minutes' $TMPFILE || return 1 23 24 if [ -e "$status_file" -a "$status_file" -nt "$old_file" ] 25 then 26 # Buddy is online. 27 $JABBER_SCRIPT "$jabber_id" "$message" 28 retval=$? 29 echo "Return value: $retval : $JABBER_SCRIPT '$jabber_id' '$message'" 30 31 return $retval 32 else 33 return 1 34 fi 35 } 36 37 notify_via_sms() { 38 local mobile_number=$1 39 local message=$2 40 41 $SMS_SCRIPT --number "$mobile_number" --message "$message" 42 retval=$? 43 echo "Return value: $retval : $SMS_SCRIPT --number '$mobile_number' --message '$message'" 44 45 return $retval 46 } 47 48 cleanup() { 49 trap - $SIGNALS 50 51 if [ -n "$TMPFILE" ] 52 then 53 rm -f "$TMPFILE" 54 fi 55 } 56 57 # Check arguments. 58 if [ $# -lt 3 ] 59 then 60 help 61 fi 62 63 jabber_id=$1 64 mobile_number=$2 65 message=$3 66 67 SIGNALS="EXIT" 68 trap cleanup $SIGNALS 69 70 ( 71 72 notify_via_jabber "$jabber_id" "$message" || send_sms "$mobile_number" "$message" 73 74 ) 2>&1 | logger -s -t 'notify_via_jabber_or_sms' --
nagios command definition
define command{ command_name notify-by-jabber-or-sms command_line /usr/local/sbin/notify_via_jabber_or_sms $CONTACTNAME$@jabber.local $CONTACTPAGER$ "$NOTIFICATIONTYPE$ - $SERVICEDESC$ on $HOSTNAME$ is $SERVICESTATE$ : $SERVICEACKCOMMENT$ : $SERVICEOUTPUT$" }
Escalation paths
You could probably fill a small book with all the details about configuring escalations in Nagios and what you can do with them. While there's not enough room for that here, it's worth describing how Anchor uses escalations and why we think they're a big benefit. Some of the terminology here is Nagios-specific and won't mean anything if you're not already familiar with much of how Nagios works.
Before we had escalations setup, all staff would receive notifications when things broke. While this meant everyone knew what was happening, there was no clear responsibility. As a result, it was entirely possible for problems to go unattended while everyone assumed someone else would take care of it. This was fixed by having a roster of responsibility for non-office periods, but all staff still received notifications. Having your staff twitch every time their phone buzzes is not a healthy state of affairs.
Our solution to this is to maintain a dynamic contactgroup, and the easiest way (for us) to do this is to take advantage of Nagios' ability to import other config fragments (a monolithic config file is messy to manage and less flexible). We simply import the contactgroup definition appropriate for whoever's on-call.
nagios.cfg
cfg_file=/etc/nagios2/contactgroup-oncall.cfg
The contactgroup for most notifications is called anchor, and changes at 6pm each day depending on who's on-call for the night. The sysadmin on-call uses their regular contact details, while everyone else gets Jabber messages via a separate contact definition which only uses the notify_via_jabber command. A typical contactgroup definition looks like this:
contactgroup-oncall.cfg
define contactgroup { contactgroup_name anchor alias Tomoya on-call members tomoya,nagisa-jabber,kyou-jabber,ryou-jabber,kotomi-jabber,fuuko-jabber,tomoyo-jabber }
Exactly how you implement a system like this will depend on your own requirements. This happens to work well for us and we update the configuration files a couple of times a day with crontab entries. A different solution would be more appropriate if you change your contactgroups more/less frequently.
See also
References/External Links
External publication of Jabber robot script - http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed%2F1316.html;d=1
