Monitoring Windows software RAID

Anyone intimately familiar with the Unix/Linux way of doing things, and is then confronted with the often ugly ways of Windows will no doubt be horrified at its lack of scripting. After all, just about every conceivable system admin task is scriptable under Unix or Linux. In reality there is a lot of scripting ability hiding under the pretty exterior of Windows.

The task at hand? Automating the checking of the status of software RAID volumes in Windows. Software RAID is a staple of the Windows dedicated server and can compare well with hardware RAID solutions in terms of capabilities and performance. What any hardware RAID solution will come with, however, is a way of monitoring what exactly the little redundant bits of data are doing at any given moment. This is a problem I have pondered over for some time, although apparently completely alone in the task as I have not managed to find a single shred of information relating to an automated check for Windows' software RAID status.

The solution had to be scripted in some way, to fit in with our automatic dedicated server monitoring systems, and obviously retrieve the relevant information about the RAID volumes to indicate their health. I started with a technology I was already familiar with: Windows Management Instrumentation (WMI). For the uninitiated, WMI allows a system administrator to script just about any management task one likes, and is thus very powerful. It allows scripting of tasks performed on remote as well as local machines, and the retrieval of significant amounts of system information. Unfortunately while it does allow one to find out a lot about the disks present in the system, software RAID status is not among the information presented.

The solution, as it turns out, is quite simple (although a little unorthodox). The DISKPART command (which is commonly used to script partitioning commands in automated installations) also shows the RAID status of all volumes if issued the 'list volume' command. DISKPART is scriptable, but technically only by issuing it the /s parameter with a textfile name. The textfile contains the script. This was not entirely handy, so I experimented and found that I could achieve my goal by just piping the desired command straight to the DISKPART program. Combined with a bit of VBScript to give Regular Expressions matching, I ended up with the following script.

raidchk.vbs:

' Software RAID status check script

Option Explicit

Dim WshShell, oExec
Dim Line, RE0, RE1, RE2, RE3
Dim Failed

Failed = -1
' Simple variable to display status of all volumes:
' 0 = Healthy
' 1 = Rebuilding
' 2 = Failed
' 3 = Unknown

' Check version of WScript. Has to be >= 5.6 for WScript.Shell.Exec to work
If Wscript.Version < 5.6 Then
   Failed = 3
   Wscript.StdOut.WriteLine("UNKNOWN: WScript version < 5.6")
   WScript.Quit(Failed)
End If

Set WshShell = WScript.CreateObject("WScript.Shell")

' Execute the DISKPART program and grab the output
Set oExec = WshShell.Exec("%comspec% /C echo list volume | %WINDIR%\SYSTEM32\DISKPART.EXE")

' Set up some regular expression objects
Set RE0 = New RegExp
Set RE1 = New RegExp
Set RE2 = New RegExp
Set RE3 = New RegExp

RE0.Pattern = "Healthy"
RE1.Pattern = "Mirror|RAID-5"
RE2.Pattern = "Failed|(At Risk)"
' At Risk indicates errors have been reported for a disk
' and it may need to be reactivated.
RE3.Pattern = "Rebuild"

' Check for no output
If oExec.StdOut.AtEndOfStream Then
    Failed = 3
Else
    While Not oExec.StdOut.AtEndOfStream
        Line = oExec.StdOut.ReadLine

        ' Tests for Mirrored or RAID-5 volumes
        If RE1.Test(Line) Then

          ' Tests for Healthy volumes
          If RE0.Test(Line) Then
            If Failed = -1 Then Failed = 0
          End If

          ' Tests for Failed RAID volumes
          If RE2.Test(Line) Then
            If Failed < 2 Then Failed = 2

          ' Tests for Rebuilding volumes
          ElseIf RE3.Test(Line) Then
            If Failed = 0 Then Failed = 1

          End If
        End If
    WEnd
End If

' If Failed is still -1, something bad has happened, or there is no RAID
If Failed = -1 Then Failed = 3

' Print out the appropriate test result
Select Case Failed
    Case 0
      WScript.StdOut.WriteLine("RAID OK: All volumes Healthy")
    Case 1
      WScript.StdOut.WriteLine("RAID WARNING: Volume(s) Rebuilding")
    Case 2
      WScript.StdOut.WriteLine("RAID CRITICAL: Volume(s) have Failed")
    Case 3
      WScript.StdOut.WriteLine("UNKNOWN: " + oExec.StdErr.ReadLine)
End Select

WScript.Quit(Failed)

At present this script only checks for the presence of ANY rebuilding or failed volumes. It could be expanded to list exactly which volumes are rebuilding or failed with a bit more text processing. You can execute the script directly on the command line with "cscript raidchk.vbs" or "cscript //nologo raidchk.vbs" to prevent the Microsoft copyright notice appearing with the output. It is ready for integration with an automated monitoring system, such as Nagios.

Credit goes to Peter Field for adding some error checking to the original version of the script which was written back in 2005. If you have any further useful additions to the script, we'd love to hear about them!

Keywords: software raid check checking windows script vbscript mirror mirroring status failure failed rebuilding rebuild

Related links