Monitoring Windows software RAID
Anyone intimately familiar with the Unix/Linux way of doing things, and is then confronted with the often ugly ways of Windows will no doubt be horrified at its lack of scripting. After all, just about every conceivable system admin task is scriptable under Unix or Linux. In reality there is a lot of scripting ability hiding under the pretty exterior of Windows.
The task at hand? Automating the checking of the status of software RAID volumes in Windows. Software RAID is a staple of the Windows dedicated server and can compare well with hardware RAID solutions in terms of capabilities and performance. What any hardware RAID solution will come with, however, is a way of monitoring what exactly the little redundant bits of data are doing at any given moment. This is a problem I have pondered over for some time, although apparently completely alone in the task as I have not managed to find a single shred of information relating to an automated check for Windows' software RAID status.
The solution had to be scripted in some way, to fit in with our automatic dedicated server monitoring systems, and obviously retrieve the relevant information about the RAID volumes to indicate their health. I started with a technology I was already familiar with: Windows Management Instrumentation (WMI). For the uninitiated, WMI allows a system administrator to script just about any management task one likes, and is thus very powerful. It allows scripting of tasks performed on remote as well as local machines, and the retrieval of significant amounts of system information. Unfortunately while it does allow one to find out a lot about the disks present in the system, software RAID status is not among the information presented.
The solution, as it turns out, is quite simple (although a little unorthodox). The DISKPART command (which is commonly used to script partitioning commands in automated installations) also shows the RAID status of all volumes if issued the 'list volume' command. DISKPART is scriptable, but technically only by issuing it the /s parameter with a textfile name. The textfile contains the script. This was not entirely handy, so I experimented and found that I could achieve my goal by just piping the desired command straight to the DISKPART program. Combined with a bit of VBScript to give Regular Expressions matching, I ended up with the following script.
' Software RAID status check script Option Explicit Dim WshShell, oExec Dim Line, RE0, RE1, RE2, RE3 Dim Failed Failed = -1 ' Simple variable to display status of all volumes: ' 0 = Healthy ' 1 = Rebuilding ' 2 = Failed ' 3 = Unknown ' Check version of WScript. Has to be >= 5.6 for WScript.Shell.Exec to work If Wscript.Version < 5.6 Then Failed = 3 Wscript.StdOut.WriteLine("UNKNOWN: WScript version < 5.6") WScript.Quit(Failed) End If Set WshShell = WScript.CreateObject("WScript.Shell") ' Execute the DISKPART program and grab the output Set oExec = WshShell.Exec("%comspec% /C echo list volume | %WINDIR%\SYSTEM32\DISKPART.EXE") ' Set up some regular expression objects Set RE0 = New RegExp Set RE1 = New RegExp Set RE2 = New RegExp Set RE3 = New RegExp RE0.Pattern = "Healthy" RE1.Pattern = "Mirror|RAID-5" RE2.Pattern = "Failed|(At Risk)" ' At Risk indicates errors have been reported for a disk ' and it may need to be reactivated. RE3.Pattern = "Rebuild" ' Check for no output If oExec.StdOut.AtEndOfStream Then Failed = 3 Else While Not oExec.StdOut.AtEndOfStream Line = oExec.StdOut.ReadLine ' Tests for Mirrored or RAID-5 volumes If RE1.Test(Line) Then ' Tests for Healthy volumes If RE0.Test(Line) Then If Failed = -1 Then Failed = 0 End If ' Tests for Failed RAID volumes If RE2.Test(Line) Then If Failed < 2 Then Failed = 2 ' Tests for Rebuilding volumes ElseIf RE3.Test(Line) Then If Failed = 0 Then Failed = 1 End If End If WEnd End If ' If Failed is still -1, something bad has happened, or there is no RAID If Failed = -1 Then Failed = 3 ' Print out the appropriate test result Select Case Failed Case 0 WScript.StdOut.WriteLine("RAID OK: All volumes Healthy") Case 1 WScript.StdOut.WriteLine("RAID WARNING: Volume(s) Rebuilding") Case 2 WScript.StdOut.WriteLine("RAID CRITICAL: Volume(s) have Failed") Case 3 WScript.StdOut.WriteLine("UNKNOWN: " + oExec.StdErr.ReadLine) End Select WScript.Quit(Failed)
At present this script only checks for the presence of ANY rebuilding or failed volumes. It could be expanded to list exactly which volumes are rebuilding or failed with a bit more text processing. You can execute the script directly on the command line with "cscript raidchk.vbs" or "cscript //nologo raidchk.vbs" to prevent the Microsoft copyright notice appearing with the output. It is ready for integration with an automated monitoring system, such as Nagios.
Credit goes to Peter Field for adding some error checking to the original version of the script which was written back in 2005. If you have any further useful additions to the script, we'd love to hear about them!
Keywords: software raid check checking windows script vbscript mirror mirroring status failure failed rebuilding rebuild