ATA DMA Issues Resurfaced (READ_DMA TIMEOUT/FAILURE)

Kendall Gifford zettabyte at gmail.com
Thu Apr 21 11:30:35 PDT 2005


Howdy. I'm not sure whether hardware or stable is the best list for
this, but here is my problem. Any info, recommendations, or help will
be greatly appreciated.

I've got a server running 5-STABLE (updated/built Jan. 22, 2005). It
has been running this kernel, a 5.3-RELEASE kernel, and other 5.x
branch versions for the last ten or so months now. Previous to this,
it was running 4.9-RELEASE.

About ten months ago, when I switched from the 4.x branch to the 5.x
branch, I immediately began experiencing WRITE_DMA ICRC errors durring
disk activity at seemingly random times. At that time I posted to this
list and questions the following message:

http://groups-beta.google.com/group/mailing.freebsd.questions/browse_thread/thread/17fe5871d823f380/a16568320427152e?rnum=2#a16568320427152e

The gist of the message and my current experience is that my hardware
(drives, cables, motherboard controllers, etc.) is definately fine and
that I've noticed others posting various, possibly-related issues both
before and since I posted the above message. I basically ended up
working around the problem by running atacontrol in a
/usr/local/etc/rc.d/ script that set my drives to PIO4 mode. I then
mostly forgot about the problem as everything has since worked
fine--that is until just recently.

About a week ago (around April 14, 2005) after performing some updates
of some ports and configurations, I decided to perform a reboot (quite
extranous, I know, but reassuring to verify that all scripts/configs
are properly set up the way I want). Just as my system began starting
local services, and just after it ran my custom /usr/local/etc/rc.d
atacontrol script, I got the following error messages:

<Screenshot>

Master = PIO4
Slave  = UDMA33
Master = PIO4
Slave  = BIOSPIO
ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=146793208
ad0: FAILURE - READ_DMA timed out
GEOM_VINUM: subdisk raid.p0.s0 is down
GEOM_VINUM: plex raid.p0 is down
Starting mysql.


Fatal trap 12: page fault while in kernel mode
fault virtual addess = 0xc
fault code = supervisor read, page not present
instruction pointer = 0x8:0xc04ba88f
stack pointer = 0x10:0xd321dc6c
frame pointer = 0x10:0xd321dc98
code segment = base 0x0, limit 0xfffff, type 0x1b
             = DPL0, pres 1, def32 1, gran1
processor eflags = interrupt enabled, resume, IOPL= 0
current process  = 4 (g_down)
trap number = 12
panic: page fault
Uptime: 28s

</End Screenshot>

This is the first time in ten months I've had issues switching to PIO4
mode during local service startup. I really am not quite sure what
happened.

Anyhow, I've since rebooted into single-user mode, brought my
gvinum-mirror plex back up, and the usual stuff to manually bring my
system up. But, I did have one attempt at doing this when I foolishly
forgot to manually atacontrol my drives before trying to bring my
gvinum plex back up. As it was restoring in the background, I
remembered and unthinkingly ran atacontrol and again succeeded in
bringing my system down in much the same manner as shown above (only
this time with WRITE_DMA errors instead of READ_DMA errors).

Anyhow, based on this experience, my two guesses as to the cause of my
booting problem is that disk activity from starting the system is
causing problems before my disks can be put fully in PIO4 mode (and
timing is immaculate) or that the current state of things when
atacontrol is executed causes problems. As you can see, I have no idea
what the real problem is and wonder if any more info on this/these
ata/dma problems is available. I wonder if I'd be better off moving to
4.11 until the root cause of these problems is found.

Any help or information anyone?

System Info:


<Kernel Config>
machine		i386
cpu		I686_CPU
device		npx
device		isa
device		pci
device		agp
options		VESA
ident		KERNEL
maxusers	100
options		SCHED_4BSD
options		COMPAT_43
options		COMPAT_FREEBSD4
options		SYSVSHM
options		SYSVSEM
options		SYSVMSG
options		KTRACE
options		INVARIANT_SUPPORT
options		INET
device		ether
device		loop
device		bpf
device		tun
options		IPFIREWALL
options		IPFIREWALL_VERBOSE
options		IPFIREWALL_VERBOSE_LIMIT=1000
options		IPDIVERT
options		FFS
options		NFSCLIENT
options		NFSSERVER
options		CD9660
options		FDESCFS
options		MSDOSFS
options		NTFS
options		NULLFS
options		PROCFS
options		PSEUDOFS
options		UDF
options		SOFTUPDATES
options		UFS_EXTATTR
options		UFS_EXTATTR_AUTOSTART
options		UFS_ACL
options		GEOM_BSD
options		GEOM_CONCAT
options		GEOM_GPT
options		GEOM_LABEL
options		GEOM_MBR
options		GEOM_MIRROR
options		GEOM_VOL
options		QUOTA
device		md
device		random
device		pty
device		snp
options		_KPOSIX_PRIORITY_SCHEDULING
device		atkbdc
device		atkbd
device		psm
device		vga
device		splash
device		sc
options		MAXCONS=16
options		SC_HISTORY_SIZE=2000
options		SC_TWOBUTTON_MOUSE
options		SC_KERNEL_CONS_ATTR=(FG_RED|BG_BLACK)
options		SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED)
device		ata
device		atadisk
device		ataraid
device		atapicd
device		atapifd
device		atapist
options 	ATA_STATIC_ID
device		fdc
device		sio
device		ppc
device		ppbus
device		lpt
device		ppi
device		pmtimer
device		mem
device		apic
device		io
device		miibus
device		vr
device		uhci
device		ohci
device		usb
device		ucom
device		ugen
device		uhid
device		ukbd
device		ulpt
device		ums
device		uscanner
</End Kernel Config>


<Device Hints>
hint.atkbdc.0.at="isa"
hint.atkbdc.0.port="0x060"
hint.atkbd.0.at="atkbdc"
hint.atkbd.0.irq="1"
hint.atkbd.0.flags="0x1"
hint.psm.0.at="atkbdc"
hint.psm.0.irq="12"
hint.vga.0.at="isa"
hint.sc.0.at="isa"
hint.sc.0.flags="0x100"
hint.fdc.0.at="isa"
hint.fdc.0.port="0x3f0"
hint.fdc.0.irq="6"
hint.fdc.0.drq="2"
hint.fd.0.at="fdc0"
hint.fd.0.drive="0"
hint.fd.1.at="fdc0"
hint.fd.1.drive="1"
hint.sio.0.at="isa"
hint.sio.0.port="0x3f8"
hint.sio.0.flags="0x10"
hint.sio.0.irq="4"
hint.sio.1.at="isa"
hint.sio.1.port="0x2f8"
hint.sio.1.irq="3"
hint.ppc.0.at="isa"
hint.ppc.0.irq="7"
</End Device Hints>


<Dmesg>
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.3-STABLE #0: Sat Jan 22 19:54:10 MST 2005
    root at name.domain.tld:/usr/obj/usr/src/sys/KERNEL
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Duron(tm) processor (1297.79-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x671  Stepping = 1
  Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
  AMD Features=0xc0400000<AMIE,DSP,3DNow!>
real memory  = 536870912 (512 MB)
avail memory = 519913472 (495 MB)
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> pcibus 0 on motherboard
pir0: <PCI Interrupt Routing Table: 9 Entries> on motherboard
pci0: <PCI bus> on pcib0
agp0: <VIA Generic host to PCI bridge> mem 0xe0000000-0xe7ffffff at
device 0.0 on pci0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci0: <display, VGA> at device 8.0 (no driver attached)
uhci0: <VIA 83C572 USB controller> port 0xd000-0xd01f irq 11 at device
16.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 3 at device
16.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <VIA 83C572 USB controller> port 0xd800-0xd81f irq 10 at device
16.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <VIA 83C572 USB controller> on uhci2
usb2: USB revision 1.0
uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 16.3 (no driver attached)
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 8235 UDMA133 controller> port
0xdc00-0xdc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on
pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pci0: <multimedia, audio> at device 17.5 (no driver attached)
vr0: <VIA VT6102 Rhine II 10/100BaseTX> port 0xe800-0xe8ff mem
0xed001000-0xed0010ff irq 11 at device 18.0 on pci0
miibus0: <MII bus> on vr0
ukphy0: <Generic IEEE 802.3u media interface> on miibus0
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vr0: Ethernet address: 00:0d:87:00:bf:1d
cpu0 on motherboard
orm0: <ISA Option ROMs> at iomem 0xc8000-0xcffff,0xc0000-0xc7fff on isa0
pmtimer0 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse, device ID 3
fdc0: <Enhanced floppy controller> at port 0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
ppbus0: <Parallel port bus> on ppc0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
unknown: <PNP0303> can't assign resources (port)
unknown: <PNP0c02> can't assign resources (memory)
unknown: <PNP0f13> can't assign resources (irq)
unknown: <PNP0501> can't assign resources (port)
unknown: <PNP0700> can't assign resources (port)
unknown: <PNP0401> can't assign resources (port)
Timecounter "TSC" frequency 1297789521 Hz quality 800
Timecounters tick every 10.000 msec
ipfw2 initialized, divert enabled, rule-based forwarding disabled,
default to deny, logging limited to 1000 packets/entry by default
ad0: 117246MB <Maxtor 6Y120P0/YAR41VW0> [238216/16/63] at ata0-master UDMA133
acd0: CDRW <LITE-ON LTR-48246S/SS08> at ata0-slave UDMA33
ad2: 117246MB <Maxtor 6Y120P0/YAR41VW0> [238216/16/63] at ata1-master UDMA133
Mounting root from ufs:/dev/ad0s1a
WARNING: / was not properly dismounted
GEOM_VINUM: subdisk raid.p1.s0 is up
GEOM_VINUM: subdisk raid.p0.s0 is stale
GEOM_VINUM: plex sync raid.p1 -> raid.p0 started
GEOM_VINUM: sd raid.p0.s0 is initializing
GEOM_VINUM: plex raid.p0 is degraded
GEOM_VINUM: plex raid.p0 is up
GEOM_VINUM: plex sync raid.p1 -> raid.p0 finished
</End Dmesg>

--
Kendall Gifford


More information about the freebsd-hardware mailing list