Nagios notification options

As we wrote in our article about advanced use of Nagios for system monitoring, we use a number of scripts and tools to help us receive and respond to notifications in a timely and efficient manner. These methods are already in use across our dedicated server and virtual private server products. Here, we'll share some of methods.

SMS (Short Messaging Service)

Belying its age, Nagios has a "pager" field in its contact definitions. Chances are you aren't using a real pager, but it's ideal for a mobile phone number. Seeing as pretty much everything is defined as an arbitrary command string, you can make use of this in whatever way is most convenient. We use an SMS gateway provider, specifically Internode's NodeText gateway.

SMS gateways

SMS web-gateway providers will typically offer a form-based service that allows for easy scripting. Internode uses a standard HTTP form that can accept parameters via GET or POST. As an example, you can use curl to do this:

curl -d '' -d 'pass=something_secret' -d 'dest=61412456789' -d 'msg=this is the message' -d 'timezone=Australia/Sydney'

GSM modem

If you send a lot of notifications, a cheaper alternative may be to invest in a standalone GSM modem. The modem is effectively a computer-controlled mobile phone, which can be used to send SMS messages. The hardware will set you back a few hundred dollars (one suitable model would be the Intercel SAM2W), and a SIM is practically free nowadays. A number of telcos (eg. Virgin and Vodafone) offer cheap plans with lots of included SMS per month. This makes for a cost-effective solution, and has the very nice advantage of working in the face of catastrophic network failure.

Integrating with Nagios

Instant Messaging

SMS is great for getting attention, but it's overkill when you're at work and can be contacted in some other way. We already have a local Jabber server in the office with a chat session, so the natural alternative was to setup a robot that can send us messages. As well as being cheaper than sending an SMS, it integrates well with our workflow. It's easy to ignore if you're working on something else, but sends messages where you expect to see them when you want them.


As an additional feature, we decided that Nagios should attempt to send an instant message if you're at your desk, otherwise you'll get an SMS. To do this, the Jabber integration needs a couple of components.

Presence-sensing robot

The robot presents itself as a "Buddy" so it can see your away-status. We usually set our Jabber client to report us as being Away if we're idle for more than five minutes. The robot is a perl script that polls for information from the server about who's signed in and what their status is.

The jabber robot runs independently and simply updates a set of files according to who's Online according to the server. How you start this is up to you. We use our configuration management system to ensure the process is running, but an initscript may be more appropriate for other circumstances.

Message-sending script

The next step is a script that actually does the hard work; sending a message to a user in Jabber.

Pulling it together

Now to connect it all to Nagios. The script first needs to determine whether you're online or not. Then send the message by Jabber if you're online, or by SMS if you're not. Some of the paths are hard-coded and dependent on the previous scripts.

Escalation paths

You could probably fill a small book with all the details about configuring escalations in Nagios and what you can do with them. While there's not enough room for that here, it's worth describing how Anchor uses escalations and why we think they're a big benefit. Some of the terminology here is Nagios-specific and won't mean anything if you're not already familiar with much of how Nagios works.

Before we had escalations setup, all staff would receive notifications when things broke. While this meant everyone knew what was happening, there was no clear responsibility. As a result, it was entirely possible for problems to go unattended while everyone assumed someone else would take care of it. This was fixed by having a roster of responsibility for non-office periods, but all staff still received notifications. Having your staff twitch every time their phone buzzes is not a healthy state of affairs.

Our solution to this is to maintain a dynamic contactgroup, and the easiest way (for us) to do this is to take advantage of Nagios' ability to import other config fragments (a monolithic config file is messy to manage and less flexible). We simply import the contactgroup definition appropriate for whoever's on-call.

The contactgroup for most notifications is called anchor, and changes at 6pm each day depending on who's on-call for the night. The sysadmin on-call uses their regular contact details, while everyone else gets Jabber messages via a separate contact definition which only uses the notify_via_jabber command. A typical contactgroup definition looks like this:

Exactly how you implement a system like this will depend on your own requirements. This happens to work well for us and we update the configuration files a couple of times a day with crontab entries. A different solution would be more appropriate if you change your contactgroups more/less frequently.

See also

References/External Links

Wiki: dedicated/Nagios_notification_options (last edited 2009-03-06 16:21:37 by BarneyDesmond)