<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Anchor Web Hosting Blog &#187; nagios</title>
	<atom:link href="http://www.anchor.com.au/blog/tag/nagios/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.anchor.com.au/blog</link>
	<description>A view into the Anchor Engineroom</description>
	<lastBuildDate>Wed, 08 Feb 2012 00:51:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Channelling your rage</title>
		<link>http://www.anchor.com.au/blog/2012/02/channelling-your-rage/</link>
		<comments>http://www.anchor.com.au/blog/2012/02/channelling-your-rage/#comments</comments>
		<pubDate>Fri, 03 Feb 2012 08:43:35 +0000</pubDate>
		<dc:creator>Barney Desmond</dc:creator>
				<category><![CDATA[FTW]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[nagios]]></category>
		<category><![CDATA[rage]]></category>
		<category><![CDATA[rewrite]]></category>
		<category><![CDATA[wiki]]></category>

		<guid isPermaLink="false">http://www.anchor.com.au/blog/?p=2518</guid>
		<description><![CDATA[Getting notifications when servers break is always annoying. We use Nagios at Anchor, a very popular solution. &#8220;Friggen nagios!&#8221; is a pretty common cry. If you get a lot of notifications in quick succession, your Rage meter starts to build up. When it hits 100% you unleash a special attack and reboot the server. That&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>Getting notifications when servers break is always annoying. We use Nagios at Anchor, a very popular solution. &#8220;<strong>Friggen nagios!</strong>&#8221; is a pretty common cry.</p>
<p>If you get a lot of notifications in quick succession, your <em>Rage meter</em> starts to build up. When it hits 100% you unleash a special attack and reboot the server.</p>
<div id="attachment_2524" class="wp-caption alignnone" style="width: 501px"><a href="http://www.anchor.com.au/blog/wp-content/uploads/2012/02/blaz13_heat_gauge_crop.jpg"><img src="http://www.anchor.com.au/blog/wp-content/uploads/2012/02/blaz13_heat_gauge_crop.jpg" alt="" title="100% gauge" width="491" height="376" class="size-full wp-image-2524" /></a><p class="wp-caption-text">Rachel&#039;s gauge is at 100%, circled in blue crayon. She can now reboot the server with her Static Iris</p></div>
<p>That&#8217;s pretty cool, but it turns out that customers don&#8217;t like reboots as much as us, so we looked at ways to reduce the rage. One great way to do this is with better documentation; we call it <strong>Ragewiki</strong>.</p>
<hr />
<p>Making use of the <tt>notes_url</tt> parameter, we provide a link to our wiki documentation directly from Nagios&#8217; web interface. There&#8217;s one page for each service, with precise instructions on how to diagnose and fix common problems, as well as a brief description of what the service actually does.</p>
<p>So now when you get that SMS at 3am (<strong>PROBLEM &#8211; <em>ntype</em> on <em>fundle</em> is CRITICAL</strong>), you don&#8217;t spend 20 minutes flailing through <em>A Brief History of Time, as told by H.P. Serverbox</em>.</p>
<hr />
<p>To sweeten the deal a bit, we also allow for host-specific instances of a service, which might need extra-special instructions. We also have a page full of terse legacy documentation that we&#8217;d like to fallback on in case the new docs haven&#8217;t been written yet. We think it&#8217;s a cute little hack so we&#8217;d like to share with you.</p>
<p>The possibilities are up to your own imagination, we just went for the most straightforward option. You could always link to a big red button that reboots the server straight away. <img src='http://www.anchor.com.au/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<ol>
<li><strong>Give every service a URL in the Ragewiki</strong>, using the <a href="http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#service">notes_url</a> argument. We attach this to the generic service template so that every single service automatically gets a link.
<pre># RageWiki ftw
notes_url /ragewiki/$HOSTNAME$/$SERVICEDESC$</pre>
<p>	You&#8217;ll notice that we&#8217;ve parameterised the URL so that each host-service pair is unique
	</li>
<li><strong>Prepare a rewrite map to check for existence of docs</strong><br />
This URL will refer to the Apache instance on the nagios server itself. It captures the request starting with <tt>/ragewiki/</tt>, extracts the hostname and servicename, then builds a suitable redirect.</p>
<p>Because we want to support per-host pages that <em>may</em> exist, we use a RewriteCond and a smart RewriteMap to check whether the page exists, then redirect accordingly. We use moin as our documentation wiki, with HTTP access control in front of that.</p>
<pre>RewriteLock /var/lock/rewrite.lock
RewriteMap RageWiki "prg:/usr/bin/xargs -n1 -d '\\\\n' /usr/bin/HEAD -sd -H 'Authorization: Basic EncodedUsernameAndPassword'"</pre>
<p>You may want to read up on Apache&#8217;s <a href="http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritemap">RewriteMap</a> functionality to make sense of this. The short version: it contacts the wiki and returns the HTTP status line for the suggested page. A 200- or 300-series status code is considered a success &#8211; the page exists and should be used.
	</li>
<li><strong>Finally, use the RewriteMap and generate a suitable redirect</strong><br />
This is a basic set of cascading rewrites, the first success will terminate further processing.</p>
<pre>
# Server-specific docs: /servers/$HOSTNAME/$SERVICENAME
RewriteCond ${RageWiki:https://magic.ponies.anchor.net.au/servers/$1/$2} ^[23]\d\d
RewriteRule ^/ragewiki/([^/]+)/(.+)$ https://magic.ponies.anchor.net.au/servers/$1/$2 [R,L]

# Whole lotta BGP goin' on (with variable check names, a variant of generic docs)
RewriteCond ${RageWiki:https://magic.ponies.anchor.net.au/Nagios/Services/bgp} ^[23]\d\d
RewriteRule ^/ragewiki/[^/]+/bgp[_-].+$ https://magic.ponies.anchor.net.au/Nagios/Services/bgp [R,L]

# Generic docs for normal services: /Nagios/Services/SERVICENAME
RewriteCond ${RageWiki:https://magic.ponies.anchor.net.au/Nagios/Services/$1} ^[23]\d\d
RewriteRule ^/ragewiki/[^/]+/(.+)$ https://magic.ponies.anchor.net.au/Nagios/Services/$1 [R,L]

# Catch any checks without docs, and send them to the fallback page.
# Funky regexes to pass the failed service name through to the fallback page.
# FIXME: Can we use a positive-lookbehind in these things? Would make it slightly tidier.
RewriteRule ^/ragewiki/([^/]+)$    https://magic.ponies.anchor.net.au/CommonNagiosServiceCheckReference#$1 [NE,R,L]
RewriteRule ^/ragewiki/.*/([^/]+)$ https://magic.ponies.anchor.net.au/CommonNagiosServiceCheckReference#$1 [NE,R,L]
</pre>
<p>Special cases with varied names, like our BGP checks, are easily handled by dropping a custom regex into the chain. It&#8217;s best if your service names have a consistent format that can be readily pared back to a basic name, but this method is fine for the occasional odd case.
	</li>
</ol>
<hr />
<p>Too easy! To give you an idea of what we think good Ragewiki docs look like:</p>
<ul>
<li>What servers does this apply to?</li>
<li>Summarise what the nagios check is for (one sentence!)</li>
<li>What&#8217;s the impact of a failure? Customer visible? Websites are down? Etc.</li>
<li>A short procedure on how to confirm the notification and diagnose it further</li>
<li>A procedure on how to fix it</li>
</ul>
<p>That&#8217;s it; the page should only be a couple of screens long at the most. If you can&#8217;t include all the necessary information, it&#8217;s best to put it on a separate and link to it. We specifically <em>don&#8217;t</em> include information about How It Works because it detracts from fixing problems faster.</p>
<p>Ragewiki works great for us, so we&#8217;d be interested in hearing your thoughts and comments. It&#8217;d also be cool to know if other people have reached the same goal, but in a different way.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.anchor.com.au/blog/2012/02/channelling-your-rage/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Monitor your servers like it&#8217;s 1996</title>
		<link>http://www.anchor.com.au/blog/2009/12/monitor-your-servers-like-its-1996/</link>
		<comments>http://www.anchor.com.au/blog/2009/12/monitor-your-servers-like-its-1996/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 00:43:23 +0000</pubDate>
		<dc:creator>matt</dc:creator>
				<category><![CDATA[WTF]]></category>
		<category><![CDATA[fail]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[nagios]]></category>
		<category><![CDATA[plugins]]></category>
		<category><![CDATA[thresholds]]></category>

		<guid isPermaLink="false">http://www.anchor.com.au/blog/?p=1398</guid>
		<description><![CDATA[Whilst I&#8217;m a fan of using percentages for my disk space checks, sometimes an explicit size is more appropriate. So, you&#8217;d expect the following to work nicely: $USER1$/check_disk -w 5G -c 1G -p /data/foo If you don&#8217;t actually test that this works (by artificially filling your disk and seeing what happens), you may be dismayed [...]]]></description>
			<content:encoded><![CDATA[<p>Whilst I&#8217;m a fan of using percentages for my disk space checks, sometimes an explicit size is more appropriate.  So, you&#8217;d expect the following to work nicely:</p>
<pre>
$USER1$/check_disk -w 5G -c 1G -p /data/foo
</pre>
<p>If you don&#8217;t actually test that this works (by artificially filling your disk and seeing what happens), you may be dismayed to find that you only get alerted when the disk has 5MB of free disk space.  Why is this?</p>
<p>Because Nagios, despite the fact that nobody has sweated the megabytes for about a gazillion years, doesn&#8217;t support &#8216;G&#8217; as a suffix for thresholds.  Oh, it&#8217;ll make a good show of pretending &#8212; after all, the output formatting options have &#8216;GB&#8217; as an option &#8212; but nope, for your thresholds it&#8217;s &#8220;5000M&#8221; all the way.</p>
<p>ROCK ON!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.anchor.com.au/blog/2009/12/monitor-your-servers-like-its-1996/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Advanced web application monitoring</title>
		<link>http://www.anchor.com.au/blog/2009/03/advanced-web-application-monitoring/</link>
		<comments>http://www.anchor.com.au/blog/2009/03/advanced-web-application-monitoring/#comments</comments>
		<pubDate>Thu, 05 Mar 2009 23:58:42 +0000</pubDate>
		<dc:creator>Davy Jones</dc:creator>
				<category><![CDATA[FTW]]></category>
		<category><![CDATA[application]]></category>
		<category><![CDATA[availability]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[nagios]]></category>
		<category><![CDATA[uptime]]></category>
		<category><![CDATA[website]]></category>

		<guid isPermaLink="false">http://www.anchor.com.au/blog/?p=482</guid>
		<description><![CDATA[We&#8217;ve been using Nagios to monitor an ever-increasing number of services on all of the servers that we own at Anchor for a number of years. For the most part the things we monitor have a focus on those that a systems administrator (us in other words) has to deal with. This includes things like [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been using <a href="http://www.anchor.com.au/hosting/dedicated/Advanced_Monitoring_of_Network_Servers_Websites_and_Applications_using_Nagios" target="_self">Nagios</a> to monitor an ever-increasing number of services on all of the servers that we own at Anchor for a number of years. For the most part the things we monitor have a focus on those that a systems administrator (us in other words) has to deal with. This includes things like CPU load, memory usage, disc space availability, swap usage, server load, availability of core applications such as web servers, data base servers, mail servers. On a given server we typically monitor anywhere from 5 to 25 different attributes.</p>
<p>The end goal of all this monitoring is to ensure that the services on the servers we run are always working.</p>
<p>We can take this a step further though, rather than just monitor the components of the server that are required to keep the websites running, we can monitor in quite detailed ways many of the components of the websites themselves.</p>
<p>At the end of the day, having a monitoring system tell you that a server is healthy and all of the applications are working only goes so far. Ultimately what&#8217;s important (to our clients) is that the website is behaving the way that they expect it to.</p>
<p>Since we don&#8217;t build any of the websites that we host unfortunately we can&#8217;t put in place systems to monitor the innards of an application. To do so requires an intimate knowledge of how the application was built. We do however have a very powerful monitoring system and if the developers of the websites put the hooks into their code, we can monitor these hooks so that both Anchor and the developers can be alerted to the problems.</p>
<p>With these hooks in place, in many cases Anchor will be able to fix the problems, but if we can&#8217;t at least we notify the client that there&#8217;s a problem (even if it is at 2am in the morning).</p>
<p>In our world, the more monitoring we put in place the greater the uptime of services and the happier our clients are. On this one though, we need our clients&#8217; help. For all Anchor customers on a <a href="http://www.anchor.com.au/dedicated-hosting/dedicated-support.py">Fully Managed support</a> pack we do our side of the monitoring free of charge, and for everyone else we can do anything &#8211; for a small fee of course.</p>
<p>To help people understand what can be achieved with web application monitoring along with some implementation ideas, we&#8217;ve put together this article on <a href="http://www.anchor.com.au/hosting/dedicated/Making_your_website_more_monitorable" target="_self">website monitoring</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.anchor.com.au/blog/2009/03/advanced-web-application-monitoring/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

