zpool raidz2 stopped working after failure of one drive

Sun Nov 20 15:02:37 UTC 2016

On Sat, Nov 19, 2016 at 09:15:54PM +0100, Marek Salwerowicz wrote:
> Hi all,
> 
> I run a following server:
> 
> - Supermicro 6047R-E1R36L
> - 96 GB RAM
> - 1x INTEL CPU E5-2640 v2 @ 2.00GHz
> - FreeBSD 10.3-RELEASE-p11
> 
> Drive for OS:
> - HW RAID1: 2x KINGSTON SV300S37A120G
> 
> zpool:
> - 18x WD RED 4TB @ raidz2
> - log: mirrored Intel 730 SSD
> - cache: single Intel 730 SSD
> 
> 
> Today after one drive's failure, the whole vdev was removed from the 
> zpool (basically the zpool was down, zpool / zfs commands were not 
> responding):
> 

[snip]

> There was no other option than hard-rebooting server.
> SMART value "Raw_Read_Error_Rate "  for the failed drive has increased 0 
> -> 1. I am about to replace it - it still has warranty.
> 
> I have now disabled the failing drive in zpool and it works fine (of 
> course, in DEGRADED state until I replace the drive)
> 
> However, I am concerned by the fact that one drive's failure has blocked 
> completely zpool.
> Is it normal normal behaviour for zpools ?

What is the setting in

zpool get failmode <poolname>

By default it is "wait", which I suspect is what caused your issues.
See the man page for zpool for more.

> Also, is there already auto hot-spare in ZFS? If I had a hot spare drive 
> in my zpool, would it be automatically replaced?

zfsd in 11.0 and later is the current path to hot spare management
in FreeBSD.  FreeBSD 10.x does not have the ability to automatically use
hot spares to replace failing drives.

Regards

Gary