Martin Horneffer maho at nic.dtag.de
Wed Mar 22 13:06:09 UTC 2006


I have a problem, probably with the SiI 3512 SATA150 controller in a
dual-Opteron IBM eServer 326:

Every once a while the kernel issues a message like:

  ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=150190687

The system waits a few seconds and continues to work normally.

It typically occurs several times a day most likely depending on the
load of the (SATA connected) hard drive.

We have two machines of the same hardware configuration, both with two
hard drives (identical type). The problem is the same with all the 4
drives on both machines. Thus I assume it's more a driver problem than
a bad SATA cable.

We are currently using one of the machines with FreeBSD 5-stable
(RELENG_5) and the other with some Linux. While Linux didn't have a
problem with the hardware, FreeBSD did. We tried 5.4-Release and
6.0-Release both with i386 as well as with amd64. We found that only
5.4-Release on amd64 was able to install, even though with some
warning. The other versions failed to install at all.

After successful installation we noticed two problems:

 - After a couple of uptime hours top stopped to report CPU
 utilization numbers (all 0). This went away by changing the
 timercounter hardware from ACPI-fast to i8254
 (kern.timecounter.hardware=i8254 in /etc/sysctl.conf).

 - The "TIMEOUT - WRITE_DMA" messages occur from time to time, always
 stopping the system for a few seconds (probably all processes trying
 to access the hard drive).

So far I didn't manage to solve the latter.

I upgraded to 5-stable (RELENG_5) with cvsup (last time today) but the
problem is still the same.

Besides the occasional hickups the machine runs fine.

The SATA controller reports as "SiI 3512A SATALink BIOS Version
4.3.47" during BIOS startup.

I'll attach the last dmesg output.

Any suggestions?

Best regards, Martin

Dr. Martin Horneffer -- maho at nic.dtag.de
Deutsche Telekom AG
T-Com Technology Engineering
Internet Backbone Architecture
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.5-PRERELEASE #1: Mon Mar 20 16:24:38 CET 2006
    root at xxxx.NIC.DTAG.DE:/usr/obj/usr/src/sys/XXXX
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Opteron(tm) Processor 248 (2193.17-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0xf5a  Stepping = 10
  AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow>
real memory  = 2146893824 (2047 MB)
avail memory = 2063441920 (1967 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 24-27 on motherboard
ioapic2 <Version 1.1> irqs 28-31 on motherboard
acpi0: <PTLTD 	 XSDT> on motherboard
acpi0: Power Button (fixed)
unknown: I/O range not supported
unknown: I/O range not supported
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
cpu0: <ACPI CPU> on acpi0
powernow0: <Cool`n'Quiet K8> on cpu0
device_attach: powernow0 attach returned 6
cpu1: <ACPI CPU> on acpi0
powernow1: <Cool`n'Quiet K8> on cpu1
device_attach: powernow1 attach returned 6
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0x8080-0x80ff,0x8000-0x807f,0xcf8-0xcff iomem 0xd8000-0xdbfff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci1: <ACPI PCI bus> on pcib1
ohci0: <OHCI (generic) USB controller> mem 0xfc100000-0xfc100fff irq 19 at device 0.0 on pci1
usb0: OHCI version 1.0, legacy support
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1: <OHCI (generic) USB controller> mem 0xfc101000-0xfc101fff irq 19 at device 0.1 on pci1
usb1: OHCI version 1.0, legacy support
usb1: <OHCI (generic) USB controller> on ohci1
usb1: USB revision 1.0
uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
pci1: <display, VGA> at device 5.0 (no driver attached)
atapci0: <SiI 3512 SATA150 controller> port 0x2400-0x240f,0x2410-0x2413,0x2418-0x241f,0x2414-0x2417,0x2420-0x2427 mem 0xfc103000-0xfc1031ff irq 17 at device 6.0 on pci1
ata2: channel #0 on atapci0
ata3: channel #1 on atapci0
isab0: <PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <AMD 8111 UDMA133 controller> port 0x1020-0x102f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
ata0: channel #0 on atapci1
ata1: channel #1 on atapci1
pci0: <serial bus, SMBus> at device 7.2 (no driver attached)
pci0: <bridge> at device 7.3 (no driver attached)
pcib2: <ACPI PCI-PCI bridge> at device 10.0 on pci0
pci2: <ACPI PCI bus> on pcib2
bge0: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2003> mem 0xfe000000-0xfe00ffff,0xfe010000-0xfe01ffff irq 24 at device 1.0 on pci2
miibus0: <MII bus> on bge0
brgphy0: <BCM5704 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge0: Ethernet address: 00:11:25:1e:23:a4
bge1: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2003> mem 0xfe020000-0xfe02ffff,0xfe030000-0xfe03ffff irq 25 at device 1.1 on pci2
miibus1: <MII bus> on bge1
brgphy1: <BCM5704 10/100/1000baseTX PHY> on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge1: Ethernet address: 00:11:25:1e:23:a5
pci0: <base peripheral, interrupt controller> at device 10.1 (no driver attached)
pcib3: <ACPI PCI-PCI bridge> at device 11.0 on pci0
pci3: <ACPI PCI bus> on pcib3
pci0: <base peripheral, interrupt controller> at device 11.1 (no driver attached)
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A, console
orm0: <ISA Option ROMs> at iomem 0xcb000-0xcf7ff,0xc9800-0xcafff,0xc8000-0xc97ff,0xc0000-0xc7fff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
acd0: CDROM <CD-224E/2.9B> at ata1-master PIO4
ad4: 76324MB <ST380013AS/3.45> [155072/16/63] at ata2-master SATA150
ad6: 76324MB <ST380013AS/3.25> [155072/16/63] at ata3-master SATA150
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/ad4s1a
ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=8319

