[Bug 229745] ahcich: CAM status: Command timeout

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 31 Jan 2024 22:39:40 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745

Kevin Zheng <kevinz5000@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kevinz5000@gmail.com

--- Comment #73 from Kevin Zheng <kevinz5000@gmail.com> ---
Hi there, sorry to resurrect a long-closed thread. I ran into a similar:

ahcich3: Timeout on slot 7 port 0
ahcich3: is 00000000 cs 00000080 ss 00000000 rs 00000080 tfd c0 serr 00000000
cmd 0000c717
(ada1:ahcich3:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich3:0:0:0): CAM status: Command timeout
(ada1:ahcich3:0:0:0): Retrying command, 0 more tries remain
ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich3: Timeout on slot 8 port 0
ahcich3: is 00000000 cs 00000000 ss 00000000 rs 00000100 tfd 150 serr 00000000
cmd 0000c817
(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Retrying command, 0 more tries remain
ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
...
GEOM_ELI: Device ada1p4.eli destroyed.
GEOM_ELI: Detached ada1p4.eli on last close.
(ada1:ahcich3:0:0:0): Periph destroyed

I'm running 14.0-RELEASE-p3 GENERIC amd64 on a Dell consumer-grade motherboard.
The HDD that is timing out is a Seagate Barracuda 7200.12.

I also suspect that there is a hardware issue, but the reason I'm reporting
this is that I have this set up in a zfs mirror on top of GELI drives where I
expect FreeBSD to not block all disk I/O waiting for this disk that is timing
out. Instead what happens is that the whole system feels like it's hanging
(probably on I/O) while the device that is timing out is eventually detached.

Perhaps this timeout needs to be made much shorter, or a timeout on a single
SATA device to not stall all I/O to this zpool? (I'm not even sure what
subsystem this is stalling, is it at the ZFS/GELI/CAM level?)

Thanks for your consideration.

-- 
You are receiving this mail because:
You are the assignee for the bug.