HAST with broken HDD

Sun Oct 5 15:50:55 UTC 2014

Am 05.10.2014 um 16:50 schrieb Dmitry Morozovsky:
> On Fri, 3 Oct 2014, Mikolaj Golub wrote:
> 
>> Disk errors are recorded to syslog. Also error counters are displayed
>> in `hastctl list' output. There is snmp_hast(3) in base -- a module
>> for bsnmp to retrieve this statistics via snmp protocol (traps are not
>> supported though).
>>
>> For notifications, the hastd can be configured to execute an arbitrary
>> command on various HAST events (see description for `exec' in
>> hast.conf(5)). Unfortunately, it does not have hooks for I/O error
>> events currently. It might be worth adding though. The problem with
>> this that it may generate to many events, so some throttling is
>> needed.
> 
> And, I it, this should be noted, some kind of error-coalescing or similar 
> before going from "warning" shate (there are some read error, but otherwise the 
> disk is useable, and it would be overly hassle to switch to remote component 
> completely) to "error" state (component is unuseable and needs to be replaced 
> ASAP; drop it from HAST pair, and switchover if needed). 
> 
> Error such as "device lost" is, of course, fatal from the very beginning; but 
> -- how should we interpret, well, sporadic controller resets with the disk 
> coming back and catching syncing again?
> 
> 

Hi Dmitry,

since HAST is somehow not so different from DRBD, why dont take their
way of Error Handling as "Template". DRBD works pretty well and rock
solid since years, a well established Solution. HAST got the potencial
to become this also, with some improvements.

Just my 2 Cents :)