HAST with broken HDD

Fri Oct 3 17:54:46 UTC 2014

On Wed, Oct 01, 2014 at 03:51:43PM +0000, Matt Churchyard wrote:

> HAST is basically "RAID1-over-network", so if a disk fails, it
> should just handle read/writes using the other disk, and the
> filesystem on top, be it UFS/ZFS/whatever, should just carry on as
> normal (which is what has been observed). Of course, HAST (or the
> OS) should notify you of the disk error though (probably through
> devd) so you can do something about it. Maybe it already exists, but
> HAST should be able to provide overall status information and raise
> events just like ZFS or any RAID subsystem would. You also of course
> shouldn't get scrub errors and corruption like that seen in the
> original post either just because one half of the HAST mirror has
> gone.

Disk errors are recorded to syslog. Also error counters are displayed
in `hastctl list' output. There is snmp_hast(3) in base -- a module
for bsnmp to retrieve this statistics via snmp protocol (traps are not
supported though).

For notifications, the hastd can be configured to execute an arbitrary
command on various HAST events (see description for `exec' in
hast.conf(5)). Unfortunately, it does not have hooks for I/O error
events currently. It might be worth adding though. The problem with
this that it may generate to many events, so some throttling is
needed.

-- 
Mikolaj Golub