I'm been having a strange problem with various builds of FreeBSD - 5.2-RELEASE
(at least) and up to 5.3-BETA4. Random hard lockups are occuring when writing
to two separate SATA drives. Sometimes the lockups occur under high IO, but not
always. Due to the random nature of the lockups, I don't have much hard evidence
and information to provide. How can I go about gathering more information? I've
tried enabling WITNESS and other kernel debugging options, but no extra
debugging data was produced.

The drives aren't configured as RAID -- they are accessed separately and not
configured in any special way. They are two 160G Seagate SATA (ST3160023AS)
drives that are being accessed via their ar* devices. I've also tried accessing
them directly via their ad* devices, but the lockups still occured.
smartmontools report the drives as good on both long and short tests.

Any suggestions would be appreciated.


dmesg output:

Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.3-BETA4 #0: Sat Sep 11 13:12:26 EDT 2004
    root at
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.53GHz (2539.10-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf27  Stepping = 7

real memory  = 1073659904 (1023 MB)
avail memory = 1041092608 (992 MB)
ACPI APIC Table: <ASUS   P4PE    >
ioapic0 <Version 2.0> irqs 0-23 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <ASUS P4PE> on motherboard
acpi0: Overriding SCI Interrupt from IRQ 9 to IRQ 22
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <Intel 82845G host to AGP bridge> mem 0xf8000000-0xfbffffff at device 0.0
on pci0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci2: <ACPI PCI bus> on pcib2
atapci0: <Promise PDC20376 SATA150 controller> port
0xa000-0xa07f,0xa400-0xa40f,0xa800-0xa83f mem
0xdd800000-0xdd81ffff,0xde000000-0xde000fff irq 23 at device 4.0 on pci2
atapci0: failed: rid 0x20 is memory, requested 4
ata2: channel #0 on atapci0
ata3: channel #1 on atapci0
ata4: channel #2 on atapci0
bge0: <Broadcom BCM5702 Gigabit Ethernet, ASIC rev. 0x1002> mem
0xdd000000-0xdd00ffff irq 20 at device 5.0 on pci2
miibus0: <MII bus> on bge0
brgphy0: <BCM5703 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
bge0: Ethernet address: 00:e0:18:fe:24:8b
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <Intel ICH4 UDMA100 controller> port
0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.1 on pci0
ata0: channel #0 on atapci1
ata1: channel #1 on atapci1
fdc0: <floppy drive controller> port 0x3f7,0x3f2-0x3f5 irq 6 drq 2 on acpi0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
orm0: <ISA Option ROM> at iomem 0xc0000-0xcffff on isa0
pmtimer0 on isa0
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 8250 or not responding
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 2539104508 Hz quality 800
Timecounters tick every 10.000 msec
acpi_cpu: throttling enabled, 8 steps (100% to 12.5%), currently 100.0%
ad0: 117246MB <Maxtor 6Y120P0/YAR41BW0> [238216/16/63] at ata0-master UDMA100
ad4: 152627MB <ST3160023AS/3.18> [310101/16/63] at ata2-master SATA150
ad6: 152627MB <ST3160023AS/3.18> [310101/16/63] at ata3-master SATA150
ar0: 152627MB <ATA RAID0 array> [19457/255/63] status: READY subdisks:
 disk0 READY on ad4 at ata2-master
ar1: 152627MB <ATA RAID0 array> [19457/255/63] status: READY subdisks:
 disk0 READY on ad6 at ata3-master
Mounting root from ufs:/dev/ad0s1a
Accounting enabled

fdisk output for the first drive.  second drive is exactly the same:

******* Working on device /dev/ar0 *******
parameters extracted from in-core disklabel are:
cylinders=19457 heads=255 sectors/track=63 (16065 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=19457 heads=255 sectors/track=63 (16065 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 312576642 (152625 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 1;
        end: cyl 1023/ head 254/ sector 63
The data for partition 2 is:
The data for partition 3 is:
The data for partition 4 is:

smartctl output:

Device Model:     ST3160023AS
Serial Number:    3JS325HD
Firmware Version: 3.18
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Mon Sep 13 23:07:54 2004 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Eric Gatenby - eric at - AIM: egatenby

Doubt of the reality of love ends by making us doubt everything.
 -- Henri-Frédéric Amiel

