ATA Drive Issues

Wil Hatfield - HyperConX wil at hyperconx.com
Fri Mar 31 22:06:17 UTC 2006


What is the problem with 5.4 and ATA drives? I am running the latest release
of FreeBSD 5.4-RELEASE-p11.  I have two basic ATA drives, no raids and no
scsi anything. Every now and then under a bit of load the harddrive freezes
with either a kernel panic or a Write_DMA error. I have to reboot the
machine and run fsck -y to recover. Sometimes I have to run it twice.

As per several posts that were similar I have the following uneffectively
enabled in my loader.conf file.

hw.ata.ata_dma=0
hw.ata.atapi_dma=0

However, this hasn't fixed the problem. From the amount of issues similar to
mine I am going to take a whack at the fact that I don't think it is
strickly a DMA or drive issue. The DMA issue is just the result of a deeper
underlying problem. Maybe something in the kernel or drivers. This same
issue is relevant for 3 brand new Supermicro machines all running nearly the
same Western Digital drives. 4 drives are 200GB WDs and 1 is a 160GB WD. All
with brand new cables. Since this is all brand new equipment please don't
pass this off as a bad cable. It isn't.

As for the drives I have smarttools running on these systems now and there
are no bad sectors and the drive health is all clean. Absolutely no issues
as reported by smarttools. No changes in any of the attributes at all.

Here is some more info:

--dmesg.today snippet--

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE-p11 #0: Tue Mar 28 17:18:36 PST 2006
    wilh at hera.xxxxxxxxx.net:/usr/obj/usr/src/sys/CUSTOM-KERNEL
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3200.13-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf41  Stepping = 1
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,M
CA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PB
E>
  Hyperthreading: 2 logical CPUs
real memory  = 2146893824 (2047 MB)
avail memory = 2099638272 (2002 MB)
ACPI APIC Table: <PTLTD          APIC  >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
ioapic2 <Version 2.0> irqs 48-71 on motherboard
ioapic3 <Version 2.0> irqs 72-95 on motherboard
ioapic4 <Version 2.0> irqs 96-119 on motherboard
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <PTLTD   RSDT> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <unknown> at device 0.1 (no driver attached)
pci0: <base peripheral> at device 1.0 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 3.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib3
pci2: <base peripheral, interrupt controller> at device 0.1 (no driver
attached)
pcib4: <ACPI PCI-PCI bridge> at device 0.2 on pci2
pci4: <ACPI PCI bus> on pcib4
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port
0x2000-0x203f mem 0xdd200000-0xdd21ffff irq 54 at device 2.0 on pci4
em0: Ethernet address: 00:30:48:2c:c3:80
em0:  Speed:N/A  Duplex:N/A
em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port
0x2040-0x207f mem 0xdd220000-0xdd23ffff irq 55 at device 2.1 on pci4
em1: Ethernet address: 00:30:48:2c:c3:81
em1:  Speed:N/A  Duplex:N/A
pci2: <base peripheral, interrupt controller> at device 0.3 (no driver
attached)
pcib5: <ACPI PCI-PCI bridge> irq 16 at device 4.0 on pci0
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 0.0 on pci5
pci6: <ACPI PCI bus> on pcib6
pci5: <base peripheral, interrupt controller> at device 0.1 (no driver
attached)
pcib7: <ACPI PCI-PCI bridge> at device 0.2 on pci5
pci7: <ACPI PCI bus> on pcib7
pci5: <base peripheral, interrupt controller> at device 0.3 (no driver
attached)
pcib8: <ACPI PCI-PCI bridge> irq 16 at device 6.0 on pci0
pci8: <ACPI PCI bus> on pcib8
pci0: <serial bus, USB> at device 29.0 (no driver attached)
pci0: <serial bus, USB> at device 29.1 (no driver attached)
pci0: <serial bus, USB> at device 29.2 (no driver attached)
pci0: <serial bus, USB> at device 29.3 (no driver attached)
pci0: <serial bus, USB> at device 29.7 (no driver attached)
pcib9: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci9: <ACPI PCI bus> on pcib9
pci9: <display, VGA> at device 1.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH5 UDMA100 controller> port
0x14a0-0x14af,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc8000-0xc8fff,0xc0000-0xc7fff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 10.000 msec
IP Filter: v3.4.35 initialized.  Default = block all, Logging = enabled
ipfw2 initialized, divert disabled, rule-based forwarding disabled, default
to deny, logging unlimited
ad0: 190782MB <WDC WD2000JB-00GVC0/08.02D08> [387621/16/63] at ata0-master
PIO4
ad1: 190782MB <WDC WD2000JB-00GVC0/08.02D08> [387621/16/63] at ata0-slave
PIO4
acd0: CDROM <CD-224E/1.9A> at ata1-master PIO4
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
Mounting root from ufs:/dev/ad0s1a
em0: Link is up 100 Mbps Full Duplex


Let me know if anyone wants more info.  Any help or insight that anyone can
provide would be great. These machines went are production as of just
recently and these issues didn't appear until put under some load. So
basically I am now screwed. HELP!

Cheers,

--
Wil Hatfield







More information about the freebsd-questions mailing list