RAID monitoring tools
Mike Holloway
mikhollo at cisco.com
Mon Oct 30 20:05:17 UTC 2006
Whoops, meant to copy the list...
Appreciate the pointer to camcontrol, I previously had just been
using swatch to watch syslog and send messages to nagios via nsca.
My problem was that I never knew the initial state of the disks until
an event happened in syslog.
For reference, here's what I saw from camcontrol during my tests
(FreeBSD 6.0 rel):
During normal operation of the raid:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device
After removing one of the raid member disks:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME inte> Fixed Direct Access SCSI-0 device
After re-inserting the raid member disk:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME reco> Fixed Direct Access SCSI-0 device
And about 45 minutes later:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device
And here's the configuration I use for swatch to feed nsca in realtime:
watchfor /ciss0.*removed/
exec "/usr/local/bin/nsca_report 2 \"Disk Array\" Hot-plug
drive removed"
watchfor /ciss0.*failure/
exec "/usr/local/bin/nsca_report 2 \"Disk Array\" Physical
drive failure"
watchfor /ciss0.*inserted/
exec "/usr/local/bin/nsca_report 1 \"Disk Array\" Hot-plug
drive inserted"
watchfor /ciss0.*recovery->recovering/
exec "/usr/local/bin/nsca_report 1 \"Disk Array\" Drive is
rebuilding..."
watchfor /ciss0.*recovering->OK/
exec "/usr/local/bin/nsca_report 0 \"Disk Array\" Drive has
successfully rebuilt."
For completeness, here's the nsca_report script that I use to send
the alarms to nagios, substitute your own thishost and -H:
#!/bin/bash
outcode=$1
thisservice=$2
thishost=`echo $HOSTNAME | sed -e "s/\./ /g" | cut -f 1 -d ' '`
shift
shift
echo -e "${thishost}\t${thisservice}\t${outcode}\t$*\n" | /usr/local/
bin/send_nsca -H www -c /usr/local/etc/send_nsca.cfg 2>&1 >> /dev/null
-mike
On Oct 29, 2006, at 12:51 AM, Marc G. Fournier wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> camcontrol devlist:
>
> <COMPAQ RAID 1 VOLUME OK> at scbus0 target 0 lun 0
> (pass0,da0)
>
> I don't have *regular* monitoring on it, mind you, just do it
> periodically, by
> hand ...
>
>
>
> - --On Sunday, October 29, 2006 15:39:26 +1100 Edwin Groothuis
> <edwin at mavetju.org> wrote:
>
>> Greetings,
>>
>> Last week we had two failing disks, and if it wasn't for a walk
>> through the datacenter (which is off-site, and ten dollars away)
>> we wouldn't have noticed it. I've read the thread about hpacucli,
>> and have had my failed attempts to get it up and running under the
>> linuxolator.
>>
>> So the question is: how do *you* monitor the status of your disks
>> and RAID arrays? Any suggestions will be appriciated.
>>
>> Edwin
>>
>> --
>> Edwin Groothuis | Personal website: http://
>> www.mavetju.org
>> edwin at mavetju.org | Weblog: http://
>> weblog.barnet.com.au/edwin/
>> _______________________________________________
>> freebsd-proliant at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-proliant
>> To unsubscribe, send any mail to "freebsd-proliant-
>> unsubscribe at freebsd.org"
>
>
>
> - ----
> Marc G. Fournier Hub.Org Networking Services (http://
> www.hub.org)
> Email . scrappy at hub.org MSN .
> scrappy at hub.org
> Yahoo . yscrappy Skype: hub.org ICQ . 7615664
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (FreeBSD)
>
> iD8DBQFFRDM94QvfyHIvDvMRAtNdAKC+AYhavYxQ4qZzP4/zqsBfLirE6gCbBebW
> Oxd406ykkw1tElrfzn1Y/zM=
> =fgIA
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> freebsd-proliant at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-proliant
> To unsubscribe, send any mail to "freebsd-proliant-
> unsubscribe at freebsd.org"
More information about the freebsd-proliant
mailing list