[Bug 204233] 'Livelocked' Geom mirror when one PATA provider experienced write timeouts
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Mon Nov 2 23:12:54 UTC 2015
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204233
Bug ID: 204233
Summary: 'Livelocked' Geom mirror when one PATA provider
experienced write timeouts
Product: Base System
Version: 10.2-STABLE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: antiduh at csh.rit.edu
I had a geom mirror made of two older 160 GB Western Digital PATA HDDs, ada0
and ada1 going into /dev/mirror/gm0.
One of the drives experienced failure such that it would respond to writes with
only a timeout:
> Oct 30 12:34:20 angst (ada0:ata0:0:0:0): WRITE_DMA48. ACB: 35 00 0f a3 64 40 11 00 00 00 20 00
> Oct 30 12:34:20 angst (ada0:ata0:0:0:0): CAM status: Command timeout
> Oct 30 12:34:20 angst (ada0:ata0:0:0:0): Retrying command
> Oct 30 12:34:20 angst (ada0:ata0:0:0:0): WRITE_DMA48. ACB: 35 00 e1 07 6d 40 12 00 00 00 20 00
> Oct 30 12:38:21 angst (ada0:ata0:0:0:0): CAM status: Command timeout
> Oct 30 12:38:21 angst (ada0:ata0:0:0:0): Retrying command
(The drive was old, so I suspect hardware failure)
Unfortunately, this brought most of the OS to a crawling halt for many hours -
gmirror was blocking all IO activity to the gm0 provider because it was waiting
for the writes to timeout, which took many minutes. However, the system wasn't
completely dead - it seemed as if queued block read requests would work when
they could slip in between the blocking writes when the write timeout elapsed.
Eventually, I was able to log into the system and manually remove the dead disk
from the mirror, after which point the system came back to life.
Why didn't gmirror drop the disk automatically?
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list