misc/152859: [new port] net-mgmt/nagios-check_hdd_health , is a Nagios plug-in written in shell to check your HDD health using SmartMonTools

Marian Jamrich jamrich.majo at gmail.com
Mon Dec 6 12:10:08 UTC 2010


>Number:         152859
>Category:       misc
>Synopsis:       [new port] net-mgmt/nagios-check_hdd_health , is a Nagios plug-in written in shell to check your HDD health using SmartMonTools
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Mon Dec 06 12:10:08 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Marian Jamrich
>Release:        8.2 prerelease
>Organization:
>Environment:
>Description:
check_hdd_health is a Nagios plug-in written in shell to check your HDD health using SmartMonTools.
This script check HDD from S.M.A.R.T this values:

- Spin Retry Count
- Reallocated Sector Ct
- Reallocated Event Count
- Current Pending Sector
- Offline Uncorrectable
- Total health test


>How-To-Repeat:

>Fix:


Patch attached with submission follows:

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	check_hdd_health
#
echo x - check_hdd_health
sed 's/^X//' >check_hdd_health << '53eb126359c9c0d8f2d23c32c84ef809'
X#!/bin/sh
X#
XPATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/usr/local/bin
X
XST_OK=0
XST_WR=1
XST_CR=2
XST_UN=3
X
Xsmartctl=$(which smartctl)
X
X## Smartmontools
XSMT=Smartmontools
X
X# Plugin name
XPROGNAME=`basename $0`
X            
X# Version
XVERSION="Version 1.0"
X        
X# Author
XAUTHOR="Marian Jamrich"
X
XTMPFILE=/tmp/smart.nagios.$$
X
X# Clean up when done or when aborting
Xtrap "rm -f ${TMPFILE}" 0 1 2 3 15
X
X#print_version() {
X#    echo "$PROGNAME $VERSION $1"
X#}
X
Xmini_help() {
X        echo "Usage $0 --device $device --without [src rsc rec cps ou]"
X}
X
Xprint_help() {
X    clear;
X    echo "*********************************************************************************"
X    echo "* $PROGNAME $VERSION $1""($AUTHOR) <jamrich.majo at gmail.com> (2010) *" 
X    echo "*********************************************************************************"
X    echo "This is Nagios plugin to check HDD health from S.M.A.R.T. by Smartmontools."
X    echo '
XThe S.M.A.R.T. attributes are specific properties (parameters) of various parts of a disk. 
XS.M.A.R.T. uses attributes to monitor the disk condition and to analyze its reliability.
X
XScript check HDD from S.M.A.R.T with the following properties (if your HDD supports it):
X
X** Spin Retry Count (src) **
XCount of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the 
Xcondition that the first attempt was unsuccessful). A decrease of this attribute value is a sign of problems in the hard disk mechanical subsystem.
X
X** Reallocated Sector Count (rsc) **
XCount of reallocated sectors. When the hard drive finds a read/write/verification error, it marks this sector as "reallocated" and transfers data to a 
Xspecial reserved area (spare area). This process is also known as remapping and "reallocated" sectors are called remaps. This is why, on a modern hard 
Xdisks, you can not see "bad blocks" while testing the surface - all bad blocks are hidden in reallocated sectors. 
X
X** Reallocated Event Count (rec) **
XCount of remap operations (transferring data from a bad sector to a special reserved disk area - spare area). The raw value of this attribute shows the 
Xtotal number of attempts to transfer data from reallocated sectors to a spare area. Unsuccessful attempts are counted as well as successful.
X
X** Current Pending Sector (cps) **
XCurrent count of unstable sectors (waiting for remapping). The raw value of this attribute indicates the total number of sectors waiting for remapping.
XLater, when some of these sectors are read successfully, the value is decreased. If errors still occur when reading some sector, the hard drive will try 
Xto restore the data, transfer it to the reserved disk area (spare area) and mark this sector as remapped. If this attribute value remains at zero, it 
Xindicates that the quality of the corresponding surface area is low.
X
X** Offline Uncorrectable (ou) **
XQuantity of uncorrectable errors. The raw value of this attribute indicates the total number of uncorrectable errors when reading/writing a sector. 
XA rise in the value of this attribute indicates that there are evident defects of the disk surface and/or there are problems in the hard disk drive 
Xmechanical subsystem.
X
X** Total health test (pass) **
XThis is test provided by Smartmontools. If total disk state is "health", Smartmontools marked as "PASSED".
X        '
X    echo "Nagios states:"
X    echo
X    echo "OK - if all values are \"0\"."
X    echo "Warning - if one or both values \"Spin Retry Count\" and \"Reallocated Event Count\" is between the values 1 to 9."
X    echo "Critical - if some value is greater than \"0\" except \"Spin Retry Count (>=10)\" and \"Reallocated Event Count (>=10)\"."
X    echo -e "\n---------------------------------------------------------------------"
X    echo "Usage:"
X    echo "$0 --device /dev/ad0 [ --without [src rsc rec cps ou]]"
X    echo "---------------------------------------------------------------------"
X    exit $ST_UN
X}
X
Xcase "$1" in
X        --help|-h|--usage|-u)
X            print_help                                              
X            exit $ST_UN
X            ;;
X        -d | --device)
X            device=$2
X            ;;
X        -V)
X            print_version
X            exit
X            ;;
X        *)
X            echo "Unknown argument: $1"
X            echo "For more information please try -h or --help!"
X            exit $ST_UN
X            ;;
Xesac
Xshift
X
Xtest -z $device && echo -e "\nYou forgot to define device! Please try \"-h or --help\" to help." && exit $ST_UN
Xtest `uname` != "FreeBSD" && echo "This plugin is only for FreeBSD." && exit $ST_UN
X
Xif [ ! -e $device ]; then
X        echo
X        echo "Unknown device \"$device\"!"
X        exit $ST_UK
Xfi
X
Xif [ -z $smartctl ]; then
X        echo -e "\nYou don't have installed $SMT. Please install it at http://smartmontools.sourceforge.net or pkg_add -r \"smartmontools\"..."
X        exit $ST_UN
Xfi
X
X$smartctl -a $device > ${TMPFILE}
XSMART_SUPPORT=`awk '/SMART support is/ {print $4}' ${TMPFILE} | tail -n 1`
X
Xif [ "${SMART_SUPPORT}" = "Unavailable" ]; then
X        echo -e "\nS.M.A.R.T support is Unavailable for $device !!! You should enable it \"smartctl -s on $device\"."
X        exit $ST_UN
Xelif [ "${SMART_SUPPORT}" != "Enabled" ]; then
X        echo -e "\nMaybe you don't have enabled S.M.A.R.T support in $SMT! Please type \"smartctl -s on $device\" that you have it turned on. Or device does not support S.M.A.R.T function."
X        exit $ST_UN
Xfi
X
X## start S.M.A.R.T test and set variables
Xsrc=`awk '/Spin_Retry_Count/ {print $10}' ${TMPFILE} `
Xrsc=`awk '/Reallocated_Sector_Ct/ {print $10}' ${TMPFILE} `
Xrec=`awk '/Reallocated_Event_Count/ {print $10}' ${TMPFILE} `
Xcps=`awk '/Current_Pending_Sector/ {print $10}' ${TMPFILE} `
Xou=`awk '/Offline_Uncorrectable/ {print $10}' ${TMPFILE} `
Xpass=`awk -F\: '/test result/ { if ( $2 == " PASSED")  print "PASSED"; else print "FAILED" }' ${TMPFILE} `
X
X## if one or more S.M.A.R.T function is not supported by your HDD, then you define --without variable and then value is set to "0"
Xargs=`getopt w:without: $*`
Xfor arg; do
X        case "$arg" in
X                src) src=0;;
X                rsc) rsc=0;;
X                rec) rec=0;;
X                cps) cps=0;;
X                ou) ou=0;;
X        esac
Xdone
X
X# test if your HDD support all parameters:
X[ -z "$src" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Spin_Retry_Count. Please try \"--without src\"." && mini_help && exit $ST_UN
X[ -z "$rsc" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Reallocated_Sector_Ct. Please try \"--without rsc\"." && mini_help && exit $ST_UN
X[ -z "$rec" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Reallocated_Event_Count. Please try --without rec." && mini_help && exit $ST_UN
X[ -z "$cps" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Current_Pending_Sector. Please try --without cps." && mini_help && exit $ST_UN
X[ -z "$ou" ]  && echo -e "***********\n** ERROR **\n***********\n${device} don't support Offline_Uncorrectable. Please try \"--without ou\"." && mini_help && exit $ST_UN
X
Xperfdata="smart=src=$src; rsc=$rsc; rec=$rec; cps=$cps; ou=$ou; pass=$pass"
X
X##### finally run test, print result and set exit code #####
Xif [ $src -eq 0 ] && [ $rsc -eq 0 ] && [ $rec -eq 0 ] && [ $cps -eq 0 ] && [ $ou -eq 0 ] && [ "$pass" = "PASSED" ]; then
X        echo "OK - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, cps=$cps, ou=$ou, HEALTH_STATUS=$pass for $device. |${perfdata}"
X        exit $ST_OK
Xelif [ $src -gt 1 -a $src -lt 10 ] && [ $rsc -gt 0 ] && [ $rec -gt 1 -a $rec -lt 10 ] && [ $cps -eq 0 ] && [ $ou -eq 0 ] && [ "$pass" = "PASSED" ]; then
X        echo "WARNING - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, cps=$cps, ou=$ou, HEALTH_STATUS=$pass for $device. |${perfdata}"
X        exit $ST_WR
Xelse
X        echo "CRITICAL - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, cps=$cps, ou=$ou, HEALT_STATUS=$pass for $device. |${perfdata}"
X        exit $ST_CR
Xfi
53eb126359c9c0d8f2d23c32c84ef809
exit



>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list