RAID monitoring tools

Mon Oct 30 20:05:17 UTC 2006

Whoops, meant to copy the list...

Appreciate the pointer to camcontrol, I previously had just been  
using swatch to watch syslog and send messages to nagios via nsca.   
My problem was that I never knew the initial state of the disks until  
an event happened in syslog.

For reference, here's what I saw from camcontrol during my tests  
(FreeBSD 6.0 rel):

During normal operation of the raid:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1  VOLUME OK> Fixed Direct Access SCSI-0 device

After removing one of the raid member disks:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1  VOLUME inte> Fixed Direct Access SCSI-0 device

After re-inserting the raid member disk:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1  VOLUME reco> Fixed Direct Access SCSI-0 device

And about 45 minutes later:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1  VOLUME OK> Fixed Direct Access SCSI-0 device

And here's the configuration I use for swatch to feed nsca in realtime:

watchfor   /ciss0.*removed/
         exec "/usr/local/bin/nsca_report 2 \"Disk Array\" Hot-plug  
drive removed"

watchfor   /ciss0.*failure/
         exec "/usr/local/bin/nsca_report 2 \"Disk Array\" Physical  
drive failure"

watchfor   /ciss0.*inserted/
         exec "/usr/local/bin/nsca_report 1 \"Disk Array\" Hot-plug  
drive inserted"

watchfor   /ciss0.*recovery->recovering/
         exec "/usr/local/bin/nsca_report 1 \"Disk Array\" Drive is  
rebuilding..."

watchfor   /ciss0.*recovering->OK/
         exec "/usr/local/bin/nsca_report 0 \"Disk Array\" Drive has  
successfully rebuilt."

For completeness, here's the nsca_report script that I use to send  
the alarms to nagios, substitute your own thishost and -H:

#!/bin/bash

outcode=$1
thisservice=$2

thishost=`echo $HOSTNAME | sed -e "s/\./ /g" | cut -f 1 -d ' '`

shift
shift

echo -e "${thishost}\t${thisservice}\t${outcode}\t$*\n" | /usr/local/ 
bin/send_nsca -H www -c /usr/local/etc/send_nsca.cfg 2>&1 >> /dev/null

-mike

On Oct 29, 2006, at 12:51 AM, Marc G. Fournier wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> camcontrol devlist:
>
> <COMPAQ RAID 1  VOLUME OK>         at scbus0 target 0 lun 0  
> (pass0,da0)
>
> I don't have *regular* monitoring on it, mind you, just do it  
> periodically, by
> hand ...
>
>
>
> - --On Sunday, October 29, 2006 15:39:26 +1100 Edwin Groothuis
> <edwin at mavetju.org> wrote:
>
>> Greetings,
>>
>> Last week we had two failing disks, and if it wasn't for a walk
>> through the datacenter (which is off-site, and ten dollars away)
>> we wouldn't have noticed it. I've read the thread about hpacucli,
>> and have had my failed attempts to get it up and running under the
>> linuxolator.
>>
>> So the question is: how do *you* monitor the status of your disks
>> and RAID arrays? Any suggestions will be appriciated.
>>
>> Edwin
>>
>> --
>> Edwin Groothuis      |            Personal website: http:// 
>> www.mavetju.org
>> edwin at mavetju.org    |          Weblog: http:// 
>> weblog.barnet.com.au/edwin/
>> _______________________________________________
>> freebsd-proliant at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-proliant
>> To unsubscribe, send any mail to "freebsd-proliant- 
>> unsubscribe at freebsd.org"
>
>
>
> - ----
> Marc G. Fournier           Hub.Org Networking Services (http:// 
> www.hub.org)
> Email . scrappy at hub.org                              MSN .  
> scrappy at hub.org
> Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (FreeBSD)
>
> iD8DBQFFRDM94QvfyHIvDvMRAtNdAKC+AYhavYxQ4qZzP4/zqsBfLirE6gCbBebW
> Oxd406ykkw1tElrfzn1Y/zM=
> =fgIA
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> freebsd-proliant at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-proliant
> To unsubscribe, send any mail to "freebsd-proliant- 
> unsubscribe at freebsd.org"