HAST with broken HDD

Wed Oct 1 13:49:06 UTC 2014

On Wed, Oct 1, 2014 at 4:29 PM, InterNetX - Juergen Gotteswinter <
jg at internetx.com> wrote:

> Am 01.10.2014 um 15:06 schrieb George Kontostanos:
> >
> >
> > On Wed, Oct 1, 2014 at 3:49 PM, InterNetX - Juergen Gotteswinter
> > <jg at internetx.com <mailto:jg at internetx.com>> wrote:
> >
> >     Am 01.10.2014 um 14:28 schrieb George Kontostanos:
> >     >
> >     > On Wed, Oct 1, 2014 at 1:55 PM, InterNetX - Juergen Gotteswinter
> >     > <jg at internetx.com <mailto:jg at internetx.com>
> >     <mailto:jg at internetx.com <mailto:jg at internetx.com>>> wrote:
> >     >
> >     >     Am 01.10.2014 um 10:54 schrieb JF-Bogaerts:
> >     >     >    Hello,
> >     >     >    I'm preparing a HA NAS solution using HAST.
> >     >     >    I'm wondering what will happen if one of disks of the
> >     primary node will
> >     >     >    fail or become erratic.
> >     >     >
> >     >     >    Thx,
> >     >     >    Jean-François Bogaerts
> >     >
> >     >     nothing. if you are using zfs on top of hast zfs wont even
> >     take notice
> >     >     about the disk failure.
> >     >
> >     >     as long as the write operation was sucessfull on one of the 2
> >     nodes,
> >     >     hast doesnt notify the ontop layers about io errors.
> >     >
> >     >     interesting concept, took me some time to deal with this.
> >     >
> >     >
> >     > Are you saying that the pool will appear to be optimal even with a
> bad
> >     > drive?
> >     >
> >     >
> >
> >     https://forums.freebsd.org/viewtopic.php?&t=24786
> >
> >
> >
> > It appears that this is actually the case. And it is very disturbing,
> > meaning that a drive failure goes unnoticed. In my case I completely
> > removed the second disk on the primary node and a zpool status showed
> > absolutely no problem. Scrubbing the pool began resilvering which
> > indicates that there is actually something wrong!
>
>
> right. lets go further and think how zfs works regarding direct hardware
> / disk access. theres a layer between which always says ey, everthing is
> fine. no more need for pool scrubbing, since hastd wont tell if anything
> is wrong :D
>
>
Correct, ZFS needs direct access and any layer in between might end up a
disaster!!!

Which means that practically HAST should only be used in UFS environments
backed by a hardware controller. In that case, HAST will not notice again
anything (unless you loose the controller) but at least you will know that
you need to replace a disk, by monitoring the controller status.