problems with sata disks (taskqueue timeout)

Wes Morgan morganw at chemikals.org
Mon Jan 19 15:30:37 PST 2009


On Mon, 19 Jan 2009, Marc UBM wrote:

>
> Hiho! :-)
>
> Occasionally, especially when uploading a large number of files, the
> (brand-new, tested) sata disks in my fileserver spit out some of these
> errors:

I've found that those kind of errors are very, very controller-dependent. 
Case in point - a 4-disk raidz on an ASUS board with a VIA SATA 
controller. The drives were attached to a highpoint rocketraid controller, 
then the data was moved off and the drives attached to the VIA controller. 
As soon as the raidz was created and data was being copied back to the 
array, taskqueue errors. So, back to the highpoint controller. Swapped out 
the board for another ASUS, but this time with the Q35 / ICH9 controller. 
No a single problem whatsoever.


>
> -----------------------
>
> Jan 19 19:51:14 hamstor kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC
> error (retrying request) LBA=882778752
>
> Jan 19 19:51:23 hamstor kernel:
> ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout -
> completing request directly
>
> Jan 19 19:51:27 hamstor kernel: ad10:
> WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing
> request directly
>
> Jan 19 19:51:31 hamstor kernel: ad10: WARNING -
> SETFEATURES ENABLE WCACHE taskqueue timeout - completing request
> directly
>
> Jan 19 19:51:35 hamstor kernel: ad10: WARNING - SET_MULTI
> taskqueue timeout - completing request directly
>
> Jan 19 19:51:35 hamstor
> kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (0 retries left)
> LBA=882778752
>
> Jan 19 19:51:35 hamstor kernel: ad10: FAILURE -
> WRITE_DMA48
> status=ff<BUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR>
> error=ff<ICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH>
> LBA=882778752
>
> Jan 19 19:51:35 hamstor root: ZFS: vdev I/O failure,
> zpool=gedaerm path=/dev/ad10 offset=451982655488 size=131072 error=5
>
> Jan 19 19:51:41 hamstor kernel: ad10: FAILURE - SET_MULTI
> status=51<READY,DSC,ERROR> error=4<ABORTED>
>
> Jan 19 19:51:41 hamstor
> kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1 retry left)
> LBA=882779008
>
> Jan 19 19:51:41 hamstor kernel: ad10: WARNING -
> WRITE_DMA48 UDMA ICRC error (retrying request) LBA=882779008 Jan 19
> 19:51:50 hamstor kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE
> taskqueue timeout - completing request directly
>
> Jan 19 19:51:54 hamstor
> kernel: ad10: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout
> - completing request directly
>
> Jan 19 19:51:58 hamstor kernel: ad10:
> WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing
> request directly
>
> Jan 19 19:52:02 hamstor kernel: ad10: WARNING -
> SET_MULTI taskqueue timeout - completing request directly Jan 19
> 19:52:02 hamstor kernel: ad10: FAILURE - WRITE_DMA48 timed out
> LBA=882779008
>
> Jan 19 19:52:02 hamstor root: ZFS: vdev I/O failure,
> zpool=gedaerm path=/dev/ad10 offset=451982786560 size=131072 error=5
>
> -----------------------
>
> I've fiddled with the cables, which seemed to help, but I've been
> unable to completely eliminate the errors. The disks are two Western
> Digital MyBooks Home Edition (1 TB per disk), connected to a Promise TX
> 4 SATA Controller:
>
> atapci0 at pci0:1:6:0:  class=0x018000 card=0x3d17105a chip=0x3d17105a
> rev=0x02 hdr=0x00 vendor     = 'Promise Technology Inc'
>    device     = 'PDC40718-GP SATA 300 TX4 Controller'
>    class      = mass storage
>
> They're connected via 50cm esata cables.
>
> I've googled on the net and found some vague hints about problems with
> the Promise TX4, but nothing concrete.
>
> What I've found is
>
> http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting
>
> basically telling me "these things happen, deal with it" :-)
>
> The problem is, I cannot produce these problems reliably, only thing I
> notice is that they *seem* to happen more often if a lot of large files
> are copied in succession.
>
> Can anybody tell me if upgrading to 7.2 oder -current will help?
>
> I'm currently running
>
> 7.0-STABLE-200804 FreeBSD 7.0-STABLE-200804 #0: Wed Dec 10 15:29:03 CET
> 2008   ***@host:/usr/obj/usr/src/sys/GENERIC  amd64
>
> Next step I'll try is upgrading to RELENG_7 to see if that helps.
>
>
> Greetings,
> Marc
>
>
>
>
>
>
>
>
>


More information about the freebsd-stable mailing list