Patch RFC: Promise SATA300 TX4 hardware bug workaround.

Søren Schmidt sos at deepcore.dk
Fri Nov 2 16:57:51 PDT 2007


Arno J. Klaassen wrote:
> definitely an improvement, but not sufficient (for my setup ) :
>
> amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
> for years i386-releng_5 with same hardware apart TX4 and
> drives)
>
> from dmesg :
>
> atapci0: <Promise PDC40718 SATA300 controller> port 0xe000-0xe07f,0xd800-0xd8ff mem 0xfbb00000-0xfbb00fff,0xfba00000-0xfba1ffff irq 18 at device 13.0 on pci0
> ata2: <ATA channel 0> on atapci0
> ata3: <ATA channel 1> on atapci0
> ata4: <ATA channel 2> on atapci0
> ata5: <ATA channel 3> on atapci0
> atapci1: <VIA 6420 SATA150 controller> port 0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff irq 20 at device 15.0 on pci0
> ata6: <ATA channel 0> on atapci1
> ata7: <ATA channel 1> on atapci1
> atapci2: <VIA 8237 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0
> ata0: <ATA channel 0> on atapci2
> ata1: <ATA channel 1> on atapci2
>
> [ ... ]
>
> ad0: 38166MB <Seagate ST3402111A 3.AAJ> at ata0-master UDMA100
> ad6: 476940MB <WDC WD5000AAKS-00TMA0 12.01C01> at ata3-master SATA300
> ad12: 305245MB <WDC WD3200JD-22KLB0 08.05J08> at ata6-master SATA150
>
> booting from ad0 and simple gconcat over ad6 and ad12.
>
> Improvement : I now can fsck /dev/concat/data without
> ad6 being detached
>
> Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data,
> I get after about some Gigs of data have been transfered :
>
> Nov  2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268435392
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435392
> Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA status=ff<BUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=ff<ICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH> LBA=268435392
> Nov  2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5
> Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=268435648
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=268435648
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out LBA=268435648
> Nov  2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5
>
> ...
>
> I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see
> if that makes a difference)
>   
One thing to try is to loose any geom raid, if raid needed use ataraid 
instead.

I'm shuffeling boards and controllers here to try to reproduce, so far 
no luck it "just works(tm)", it seems to depend quite heavily on the 
"right" combination of possibly marginal HW....

-Søren




More information about the freebsd-hackers mailing list