HAST with broken HDD

Wed Oct 1 13:29:05 UTC 2014

Am 01.10.2014 um 15:06 schrieb George Kontostanos:
> 
> 
> On Wed, Oct 1, 2014 at 3:49 PM, InterNetX - Juergen Gotteswinter
> <jg at internetx.com <mailto:jg at internetx.com>> wrote:
> 
>     Am 01.10.2014 um 14:28 schrieb George Kontostanos:
>     >
>     > On Wed, Oct 1, 2014 at 1:55 PM, InterNetX - Juergen Gotteswinter
>     > <jg at internetx.com <mailto:jg at internetx.com>
>     <mailto:jg at internetx.com <mailto:jg at internetx.com>>> wrote:
>     >
>     >     Am 01.10.2014 um 10:54 schrieb JF-Bogaerts:
>     >     >    Hello,
>     >     >    I'm preparing a HA NAS solution using HAST.
>     >     >    I'm wondering what will happen if one of disks of the
>     primary node will
>     >     >    fail or become erratic.
>     >     >
>     >     >    Thx,
>     >     >    Jean-François Bogaerts
>     >
>     >     nothing. if you are using zfs on top of hast zfs wont even
>     take notice
>     >     about the disk failure.
>     >
>     >     as long as the write operation was sucessfull on one of the 2
>     nodes,
>     >     hast doesnt notify the ontop layers about io errors.
>     >
>     >     interesting concept, took me some time to deal with this.
>     >
>     >
>     > Are you saying that the pool will appear to be optimal even with a bad
>     > drive?
>     >
>     >
> 
>     https://forums.freebsd.org/viewtopic.php?&t=24786
> 
> 
> 
> It appears that this is actually the case. And it is very disturbing,
> meaning that a drive failure goes unnoticed. In my case I completely
> removed the second disk on the primary node and a zpool status showed
> absolutely no problem. Scrubbing the pool began resilvering which
> indicates that there is actually something wrong! 

right. lets go further and think how zfs works regarding direct hardware
/ disk access. theres a layer between which always says ey, everthing is
fine. no more need for pool scrubbing, since hastd wont tell if anything
is wrong :D

> 
>    pool: tank
> 
>  state: ONLINE
> 
> status: One or more devices has experienced an error resulting in data
> 
> corruption.  Applications may be affected.
> 
> action: Restore the file in question if possible.  Otherwise restore the
> 
> entire pool from backup.
> 
>    see: http://illumos.org/msg/ZFS-8000-8A
> 
>   scan: scrub repaired 16K in 0h2m with 7 errors on Wed Oct  1 16:00:47 2014
> 
> config:
> 
> 
> NAME            STATE     READ WRITE CKSUM
> 
> tank            ONLINE       0     0     7
> 
>   mirror-0      ONLINE       0     0    40
> 
>     hast/disk1  ONLINE       0     0    40
> 
>     hast/disk2  ONLINE       0     0    40
> 
> 
> Unfortunately, in this case there was data loss and hastctl status does
> not report the missing disk!
> 
> NameStatusRoleComponents
> 
> disk1complete primary        /dev/ada1hast2
> 
> disk2complete primary        /dev/ada2hast2 
> 
> 
> -- 
> George Kontostanos
> ---
> http://www.aisecure.net