ahci.ko / geom_mirror / zfs hangs up system when one of HDDs fauilts.

Alexander Motin mav at FreeBSD.org
Fri Jul 22 16:22:54 UTC 2011


Lev Serebryakov wrote:
>   I've have two identical live locks when HDD becomes broken on
> 8.2-STABLE system with two SATA HDDs withgmirror and ZFS on them.
> 
>   It is Hetzner-based server, so only access I have is LARA console,
> but symptoms are identical in both cases: HDD becomes bad, ahci.ko
> complains about timeouts, and after that server stops to respond on
> high-level access attempts (ssh/HTTP/SMTP), but can be pinged both
> with IPv4 and IPv6 addresses.
> 
>  HDDs are identical, and they are splitted into several (BSD)partions.
> Some partitions are mirrired with geom_mirror and one pair of
> partitions are added to (mirrored) ZFS pool like this (I proved output
> on rebooted one-HDD-only system, but, I think, it is clear how it
> looks when both HDDs are Ok):
> 
>  Screenshot of LARA console in such case is attached.

Kernel messages look like if controller or device stuck, unable to
complete some command and can't recover from that condition even after
device hard reset. I don't see what driver can do about it, except being
more aggressive in dropping faulty device after several consecutive
timeouts. If that is not a wanted way out, start from updating card BIOS
and devices firmware.

-- 
Alexander Motin


More information about the freebsd-hardware mailing list