Nagios checks for iSCSI targets with a blind initiator

April 11, 2013 Technical, General

We’ve recently found ourselves in the situation where we’re managing a highly-available storage service for a customer without actually having direct access to the data on the server. HA storage is commonplace for us now, but not having access to the data is unusual, particularly as a full-stack hosting provider.

The reason for this is that the server is presenting iSCSI targets, effectively networked block devices for the clients (“initiators” in iSCSI parlance). We don’t have access to the clients so we can’t find out what they’re doing. In short, there’s no easy access to the data, and it would probably be dangerous for us to try – we’d have to join their OCFS2 cluster to avoid corruption.

That doesn’t mean that we can’t monitor them though. As long as we give the monitoring node access to the iSCSI targets, we can access the LUNs. That’s enough to test reachability without escalating up the stack for deeper access; that’s the “blind” aspect of the check.

We’ve posted the code in a github repo in the hope that it might be useful to others. It’s pretty basic stuff, but it might be just the thing if you’re in a similar situation or aren’t getting enough love from more heavyweight checks that tend to use SNMP.

It’s important to note that this isn’t a standalone check, it’s part of a comprehensive monitoring and trending suite that we deploy. The servers providing the high-availability iSCSI targets are closely watched to ensure that we’re aware of any low-level issues, while this check provides assurance of correct behaviour at a higher level, as far as the handoff point to the clients.