HAST + ZFS + NFS + CARP

Thu Jun 30 15:35:51 UTC 2016

-----Original Message-----
From: owner-freebsd-fs at freebsd.org [mailto:owner-freebsd-fs at freebsd.org] On Behalf Of InterNetX - Juergen Gotteswinter
Sent: 30 June 2016 16:14
To: Julien Cigar; freebsd-fs at freebsd.org
Subject: Re: HAST + ZFS + NFS + CARP

Am 30.06.2016 um 16:45 schrieb Julien Cigar:
> Hello,
> 
> I'm always in the process of setting a redundant low-cost storage for 
> our (small, ~30 people) team here.
> 
> I read quite a lot of articles/documentations/etc and I plan to use 
> HAST with ZFS for the storage, CARP for the failover and the "good old NFS"
> to mount the shares on the clients.
> 
> The hardware is 2xHP Proliant DL20 boxes with 2 dedicated disks for 
> the shared storage.
> 
> Assuming the following configuration:
> - MASTER is the active node and BACKUP is the standby node.
> - two disks in each machine: ada0 and ada1.
> - two interfaces in each machine: em0 and em1
> - em0 is the primary interface (with CARP setup)
> - em1 is dedicated to the HAST traffic (crossover cable)
> - FreeBSD is properly installed in each machine.
> - a HAST resource "disk0" for ada0p2.
> - a HAST resource "disk1" for ada1p2.
> - a zpool create zhast mirror /dev/hast/disk0 /dev/hast/disk1 is created
>   on MASTER
> 
> A couple of questions I am still wondering:
> - If a disk dies on the MASTER I guess that zpool will not see it and
>   will transparently use the one on BACKUP through the HAST ressource..

thats right, as long as writes on $anything have been successful hast is happy and wont start whining

>   is it a problem? 

imho yes, at least from management view

> could this lead to some corruption?

probably, i never heard about anyone who uses that for long time in production

 At this stage the
>   common sense would be to replace the disk quickly, but imagine the
>   worst case scenario where ada1 on MASTER dies, zpool will not see it 
>   and will transparently use the one from the BACKUP node (through the 
>   "disk1" HAST ressource), later ada0 on MASTER dies, zpool will not 
>   see it and will transparently use the one from the BACKUP node 
>   (through the "disk0" HAST ressource). At this point on MASTER the two 
>   disks are broken but the pool is still considered healthy ... What if 
>   after that we unplug the em0 network cable on BACKUP? Storage is
>   down..
> - Under heavy I/O the MASTER box suddently dies (for some reasons), 
>   thanks to CARP the BACKUP node will switch from standy -> active and 
>   execute the failover script which does some "hastctl role primary" for
>   the ressources and a zpool import. I wondered if there are any
>   situations where the pool couldn't be imported (= data corruption)?
>   For example what if the pool hasn't been exported on the MASTER before
>   it dies?
> - Is it a problem if the NFS daemons are started at boot on the standby
>   node, or should they only be started in the failover script? What
>   about stale files and active connections on the clients?

>sometimes stale mounts recover, sometimes not, sometimes clients need even reboots

> - A catastrophic power failure occur and MASTER and BACKUP are suddently
>   powered down. Later the power returns, is it possible that some
>   problem occur (split-brain scenario ?) regarding the order in which 
> the

>sure, you need an exact procedure to recover

Happy to be correctly, but last time I looked at this, the NFS filesystem ID was likely to be different on both systems (and cannot be set like on Linux), and so the mounts would be useless on the clients after failover. You'd need to remount the NFS filesystem on the clients.

>   two machines boot up?

>best practice should be to keep everything down after boot

> - Other things I have not thought?
> 

> Thanks!
> Julien
> 

>imho:

>leave hast where it is, go for zfs replication. will save your butt, sooner or later if you avoid this fragile combination 

Personally I agree. This sort of functionality is incredibly difficult to get right and I wouldn't want to run anything critical relying on a few HAST scripts I'd put together manually.

Matt