ata timeouts under load
Sean C. Farley
scf at FreeBSD.org
Mon Sep 14 16:51:42 UTC 2009
On Mon, 14 Sep 2009, Mike Tancsa wrote:
> At 11:21 AM 9/14/2009, Miroslav Lachman wrote:
>
>> I have very similar problem with one disk in gmirror, but it is on 7.2
>> not current.
>
>> Sep 14 04:48:29 jimi kernel: ad6: timeout waiting to issue command
>> Sep 14 04:48:29 jimi kernel: ad6: error issuing FLUSHCACHE command
>> Sep 14 04:48:29 jimi kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=447001516
>> Sep 14 04:48:29 jimi kernel: ad6: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=447001516
>
> Are you sure this is not just a bad cable ? I have had similar symptoms
> which was a result of a bad cable. If possible, swap the cable between
> the 2 disks and see if it follows the cable.
I also have the same/similar problem with 7.2 (and earlier). I have
replaced the cable and the drive. Replacing the drive resulted in the LBA
changing, but otherwise the LBA never changes. Extended offline tests complete without
errors.
Timeout message:
kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=43471743
I do use this in /boot/loader.conf to help (I hope) prevent the timeout
from breaking the mirror:
kern.geom.mirror.timeout=45
Reading that region with dd does not produce the timeout, but it may be
because of this just noticed error:
Error 9 occurred at disk power-on lifetime: 13578 hours (565 days + 18 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 59 11 8e 53 97 e2 Error: UNC 17 sectors at LBA = 0x0297538e = 43471758
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 02 20 7f 53 97 e2 97 00:04:48.074 READ DMA
c8 02 20 5f 53 97 e2 97 00:04:48.062 READ DMA
c8 02 20 3f 53 97 e2 97 00:04:48.050 READ DMA
c8 02 04 43 6e c5 e2 c5 00:04:48.029 READ DMA
c8 02 20 ff d6 8b e2 8b 00:04:48.016 READ DMA
Would this error mean that the drive has remapped the block? However,
remapping should only occur when the block has a write operation applied
to it, yes? Is there a safe way of writing to a specific block? Would it
be safe to read a block with dd and write it back? Of course, the drive
would not be in the mirror at the time.
Sean
--
scf at FreeBSD.org
More information about the freebsd-current
mailing list