Need help with SATA disk timing out in 8.1 Beta

Jerry Bell jerry at nrdx.com
Fri Jun 18 05:11:37 UTC 2010


I am having all sorts of problems with drives in a new server.
I have a 450G sata drive that hold my root partition, works great, no 
issues.
I have a second, 1TB drive that has been all sorts of trouble.  When 
writing to this disk, I occasionally see errors like this:

Jun 17 07:40:36 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=1564898207
Jun 17 07:40:36 www3 kernel: ad8: FAILURE - WRITE_DMA48 
status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1564898207
Jun 17 07:57:12 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=1565052351
Jun 17 07:57:12 www3 kernel: ad8: FAILURE - WRITE_DMA48 
status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1565052351
Jun 17 09:45:12 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=1565983775
Jun 17 09:45:12 www3 kernel: ad8: FAILURE - WRITE_DMA48 
status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1565983775
Jun 17 09:50:24 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=1566082719
Jun 17 09:50:24 www3 kernel: ad8: FAILURE - WRITE_DMA48 
status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1566082719
Jun 17 10:01:25 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=1566358623
Jun 17 10:01:25 www3 kernel: ad8: FAILURE - WRITE_DMA48 
status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1566358623
Jun 17 10:02:59 www3 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error 
(retrying request) LBA=1566387807
Jun 17 10:02:59 www3 kernel: ad8: FAILURE - WRITE_DMA48 
status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=1566387807
Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=43231
Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=57567
Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=773471
Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=786271
Jun 17 10:18:59 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=810079
Jun 17 10:19:00 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=76767
Jun 17 10:19:00 www3 kernel: ad8: WARNING - WRITE_DMA UDMA ICRC error 
(retrying request) LBA=784479

Last week, I asked the datacenter to provide me with a new 1TB drive, 
and they did.  It formatted fine, no errors.  I copied files to it, ran 
bonnie, etc, and no signs of any DMA issues.
Until this morning when I started having the errors again.

If I run a tool like bonnie, I am very easily reproduce the errors.  
After some research, I find that these errors are often indicative of 
SATA cable problems.
The datacenter replaced the cable, and the problem continues.
The datacenter moved the sata cable to a new SATA port, and the problem 
continues
The datacenter adds a BRAND NEW 1TB drive (now the system has 3 drive), 
and I am unable to format the drive because of these errors:
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=168172351
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=602334847
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=602334847
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=427014463
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=427014463
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=15425407
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=471408895
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=471408895
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=91422655
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=203161183
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) 
LBA=1211817727
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=1211817727
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=37998847
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=309632575
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=309632575
ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=24831007
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=59067391
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=497744575
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=497744575
ad10: FAILURE - WRITE_MUL status=51<READY,DSC,ERROR> 
error=84<ICRC,ABORTED> LBA=1128895
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=13920511
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=547029919
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=547029919

So, the problem has occurred on 3 different drives.
SATA ports and cables do not appear to impact the problem.
The primary 450GB drive does not have any problems.
I have used atacontrol to lower the speed all the way down to UDMA 33, 
with the same result.

I am at the end of my ability to troubleshoot this.  Could this be a 
problem with FreeBSD 8.1 beta and not the drives after all?
I have seen a reference to a patch for previous versions that increase 
the DMA timeout time to 10 or 15 seconds, which fixes problems, but I am 
not certain that would fix my particular issue.


Here is the dmesg output:
Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.1-PRERELEASE #1: Thu Jun 10 23:52:29 UTC 2010
     jerry at www3.stelesys.com:/usr/obj/usr/src/sys/JERRY amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz (2674.98-MHz 
K8-class CPU)
   Origin = "GenuineIntel"  Id = 0x106e5  Family = 6  Model = 1e  
Stepping = 5
   
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
   
Features2=0x98e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT>
   AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
   AMD Features2=0x1<LAHF>
   TSC: P-state invariant
real memory  = 6442450944 (6144 MB)
avail memory = 6138769408 (5854 MB)
ACPI APIC Table: <020910 APIC2308>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
  cpu2 (AP): APIC ID:  2
  cpu3 (AP): APIC ID:  3
  cpu4 (AP): APIC ID:  4
  cpu5 (AP): APIC ID:  5
  cpu6 (AP): APIC ID:  6
  cpu7 (AP): APIC ID:  7
ioapic0: Changing APIC ID to 8
ioapic0 <Version 2.0> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <020910 RSDT2308> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, bdf00000 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU> on acpi0
ACPI Warning: Incorrect checksum in table [OEMB] - 0x86, should be 0x85 
(20100331/tbutils-354)
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on 
acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 3.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> mem 
0xfa000000-0xfaffffff,0xd0000000-0xdfffffff,0xf9000000-0xf9ffffff irq 16 
at device 0.0 on pci1
pci0: <base peripheral> at device 8.0 (no driver attached)
pci0: <base peripheral> at device 8.1 (no driver attached)
pci0: <base peripheral> at device 8.2 (no driver attached)
pci0: <base peripheral> at device 8.3 (no driver attached)
pci0: <base peripheral> at device 16.0 (no driver attached)
pci0: <base peripheral> at device 16.1 (no driver attached)
pci0: <simple comms> at device 22.0 (no driver attached)
ehci0: <Intel PCH USB 2.0 controller USB-B> mem 0xf8ffe000-0xf8ffe3ff 
irq 16 at device 26.0 on pci0
ehci0: [ITHREAD]
usbus0: EHCI version 1.0
usbus0: <Intel PCH USB 2.0 controller USB-B> on ehci0
pci0: <multimedia, HDA> at device 27.0 (no driver attached)
pcib2: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0
pci6: <ACPI PCI bus> on pcib2
atapci0: <JMicron JMB368 UDMA133 controller> port 
0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0xe400-0xe40f 
irq 16 at device 0.0 on pci6
atapci0: [ITHREAD]
ata2: <ATA channel 0> on atapci0
ata2: [ITHREAD]
pcib3: <ACPI PCI-PCI bridge> irq 18 at device 28.2 on pci0
pci5: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> irq 17 at device 28.4 on pci0
pci3: <ACPI PCI bus> on pcib5
re0: <RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet> port 
0xd800-0xd8ff mem 0xf7fff000-0xf7ffffff,0xf7ff8000-0xf7ffbfff irq 16 at 
device 0.0 on pci3
re0: Using 1 MSI messages
re0: Chip rev. 0x28000000
re0: MAC rev. 0x00000000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211B media interface> PHY 1 on miibus0
rgephy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
re0: Ethernet address: e0:cb:4e:ed:05:a0
re0: [FILTER]
pcib6: <ACPI PCI-PCI bridge> irq 16 at device 28.5 on pci0
pci2: <ACPI PCI bus> on pcib6
ehci1: <Intel PCH USB 2.0 controller USB-A> mem 0xf8ffd000-0xf8ffd3ff 
irq 23 at device 29.0 on pci0
ehci1: [ITHREAD]
usbus1: EHCI version 1.0
usbus1: <Intel PCH USB 2.0 controller USB-A> on ehci1
pcib7: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci7: <ACPI PCI bus> on pcib7
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <Intel PCH SATA300 controller> port 
0xbc00-0xbc07,0xb880-0xb883,0xb800-0xb807,0xb480-0xb483,0xb400-0xb40f,0xb080-0xb08f 
irq 21 at device 31.2 on pci0
atapci1: [ITHREAD]
ata3: <ATA channel 0> on atapci1
ata3: [ITHREAD]
ata4: <ATA channel 1> on atapci1
ata4: [ITHREAD]
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
atapci2: <Intel PCH SATA300 controller> port 
0xcc00-0xcc07,0xc880-0xc883,0xc800-0xc807,0xc480-0xc483,0xc400-0xc40f,0xc080-0xc08f 
irq 21 at device 31.5 on pci0
atapci2: [ITHREAD]
ata5: <ATA channel 0> on atapci2
ata5: [ITHREAD]
ata6: <ATA channel 1> on atapci2
ata6: [ITHREAD]
acpi_button0: <Power Button> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
orm0: <ISA Option ROM> at iomem 0xce800-0xcf7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
ppc0: cannot reserve I/O port range
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
p4tcc1: <CPU Frequency Thermal Control> on cpu1
est2: <Enhanced SpeedStep Frequency Control> on cpu2
p4tcc2: <CPU Frequency Thermal Control> on cpu2
est3: <Enhanced SpeedStep Frequency Control> on cpu3
p4tcc3: <CPU Frequency Thermal Control> on cpu3
est4: <Enhanced SpeedStep Frequency Control> on cpu4
p4tcc4: <CPU Frequency Thermal Control> on cpu4
est5: <Enhanced SpeedStep Frequency Control> on cpu5
p4tcc5: <CPU Frequency Thermal Control> on cpu5
est6: <Enhanced SpeedStep Frequency Control> on cpu6
p4tcc6: <CPU Frequency Thermal Control> on cpu6
est7: <Enhanced SpeedStep Frequency Control> on cpu7
p4tcc7: <CPU Frequency Thermal Control> on cpu7
Timecounters tick every 1.000 msec
IP Filter: v4.1.28 initialized.  Default = pass all, Logging = enabled
usbus0: 480Mbps High Speed USB v2.0
usbus1: 480Mbps High Speed USB v2.0
ad7: 476940MB <Seagate ST3500418AS CC38> at ata3-slave UDMA100 SATA 3Gb/s
ugen0.1: <Intel> at usbus0
uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
GEOM: ad7s1: geometry does not match label (255h,63s != 16h,63s).
ad9: 953869MB <WDC WD10EALS-00Z8A0 05.01D05> at ata4-slave UDMA100 SATA 
3Gb/s
ad10: 953869MB <WDC WD10EALS-00Z8A0 05.01D05> at ata5-master UDMA100 
SATA 3Gb/s
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #4 Launched!
Root mount waiting for: usbus1 usbus0
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
Root mount waiting for: usbus1 usbus0
ugen0.2: <vendor 0x8087> at usbus0
uhub2: <vendor 0x8087 product 0x0020, class 9/0, rev 2.00/0.00, addr 2> 
on usbus0
ugen1.2: <vendor 0x8087> at usbus1
uhub3: <vendor 0x8087 product 0x0020, class 9/0, rev 2.00/0.00, addr 2> 
on usbus1
uhub2: 6 ports with 6 removable, self powered
Root mount waiting for: usbus1
uhub3: 8 ports with 8 removable, self powered
Trying to mount root from ufs:/dev/ad7s1a
re0: link state changed to UP
ugen1.3: <Peppercon AG> at usbus1
ukbd0: <Peppercon AG Multidevice, class 0/0, rev 2.00/0.01, addr 3> on 
usbus1
kbd2 at ukbd0
ums0: <Peppercon AG Multidevice, class 0/0, rev 2.00/0.01, addr 3> on usbus1
ums0: 3 buttons and [Z] coordinates ID=0
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=168172351
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=602334847
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=602334847
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=427014463
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=427014463
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=15425407
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=471408895
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=471408895
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=91422655
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=203161183
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) 
LBA=1211817727
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=1211817727
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=37998847
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=309632575
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=309632575
ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=24831007
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=59067391
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=497744575
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=497744575
ad10: FAILURE - WRITE_MUL status=51<READY,DSC,ERROR> 
error=84<ICRC,ABORTED> LBA=1128895
ad10: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=13920511
ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=547029919
ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> 
error=10<NID_NOT_FOUND> LBA=547029919

Please help.

Thank you,

Jerry



More information about the freebsd-questions mailing list