HAST + ZFS + NFS + CARP

Thu Jun 30 15:04:01 UTC 2016

Hello,

I'm always in the process of setting a redundant low-cost storage for 
our (small, ~30 people) team here.

I read quite a lot of articles/documentations/etc and I plan to use HAST
with ZFS for the storage, CARP for the failover and the "good old NFS"
to mount the shares on the clients.

The hardware is 2xHP Proliant DL20 boxes with 2 dedicated disks for the
shared storage.

Assuming the following configuration:
- MASTER is the active node and BACKUP is the standby node.
- two disks in each machine: ada0 and ada1.
- two interfaces in each machine: em0 and em1
- em0 is the primary interface (with CARP setup)
- em1 is dedicated to the HAST traffic (crossover cable)
- FreeBSD is properly installed in each machine.
- a HAST resource "disk0" for ada0p2.
- a HAST resource "disk1" for ada1p2.
- a zpool create zhast mirror /dev/hast/disk0 /dev/hast/disk1 is created
  on MASTER

A couple of questions I am still wondering:
- If a disk dies on the MASTER I guess that zpool will not see it and
  will transparently use the one on BACKUP through the HAST ressource..
  is it a problem? could this lead to some corruption? At this stage the
  common sense would be to replace the disk quickly, but imagine the
  worst case scenario where ada1 on MASTER dies, zpool will not see it 
  and will transparently use the one from the BACKUP node (through the 
  "disk1" HAST ressource), later ada0 on MASTER dies, zpool will not 
  see it and will transparently use the one from the BACKUP node 
  (through the "disk0" HAST ressource). At this point on MASTER the two 
  disks are broken but the pool is still considered healthy ... What if 
  after that we unplug the em0 network cable on BACKUP? Storage is
  down..
- Under heavy I/O the MASTER box suddently dies (for some reasons), 
  thanks to CARP the BACKUP node will switch from standy -> active and 
  execute the failover script which does some "hastctl role primary" for
  the ressources and a zpool import. I wondered if there are any
  situations where the pool couldn't be imported (= data corruption)?
  For example what if the pool hasn't been exported on the MASTER before
  it dies?
- Is it a problem if the NFS daemons are started at boot on the standby
  node, or should they only be started in the failover script? What
  about stale files and active connections on the clients?
- A catastrophic power failure occur and MASTER and BACKUP are suddently
  powered down. Later the power returns, is it possible that some
  problem occur (split-brain scenario ?) regarding the order in which the
  two machines boot up?
- Other things I have not thought?

Thanks!
Julien

-- 
Julien Cigar
Belgian Biodiversity Platform (http://www.biodiversity.be)
PGP fingerprint: EEF9 F697 4B68 D275 7B11  6A25 B2BB 3710 A204 23C0
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20160630/775ae549/attachment.sig>