RELENG_7: zfs mirror causes ata timeout

Stephen M. Rumble stephen.rumble at utoronto.ca
Tue Jan 8 14:42:03 PST 2008


Hi all,

I'm having a bit of trouble with a new machine running the latest  
RELENG_7 code. I have two 500GB WD Caviar GP disks on a mini-itx  
GM965-based board (MSI "fuzzy") running amd64 with 4GB of ram. The  
disks are:

ad4: 476940MB <WDC WD5000AACS-00ZUB0 01.01B01> at ata2-master SATA150
ad6: 476940MB <WDC WD5000AACS-00ZUB0 01.01B01> at ata3-master SATA150

Both appear to work great alone with UFS and ZFS and separate  
filesystems/pools. However, soon after I create a ZFS mirror between  
the two I run into the following sort of trouble:

ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -  
completing request directly
ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -  
completing request directly
ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -  
completing request directly
ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -  
completing request directly
ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad6: FAILURE - READ_DMA timed out LBA=xxxxxxxx

Usually these continue on ad infinitum. Sometimes the machine  
recovers, only to fail soon after. These errors also aren't trivial to  
reproduce. They seem to happen at random, especially when the system  
is under low utilisation. Sometimes, however, they occur immediately  
upon boot.

I've tried different power supplies and cables. I've enabled and  
disabled spread spectrum clocking and tried both SATA300 and SATA150  
rates. I've also tried switching drives between ports so that what was  
ad4 is ad6 and what was ad6 is ad4. The problems persist, but seem to  
follow the same drive (ad6 originally, then ad4 when swapped). This  
seems to indicate a drive problem, but it works great on its own, even  
when exercising both disks simultaneously. SMART reports no problems  
and ZFS reports no issues when ad6 is used on its own outside of a zfs  
mirror. It seems like it's the drive, but it works fine when not in a  
mirror. I'm stumped. Any ideas?

The only interesting bit of evidence I could find is that when these  
errors do occur, smartctl reports an increase in the Start_Stop_Count  
field on ad6. ad4, which appears to work fine, doesn't demonstrate  
this and has a much lower value.

Any input would be appreciated. I've tried disabling ACPI, but the  
kernel cannot find the controller (ICH8M). I'm using AHCI, but  
compatibility mode doesn't appear to alter the behaviour. I don't know  
if it's important, but I'm not using ZFS on the whole drive, just  
ad{4,6}s1d.

Any help would be appreciated.

Thanks,
Steve

P.S. Please cc me on replies as I'm not subscribed.



More information about the freebsd-stable mailing list