Debugging an unknown reboot (disk / io related)

Steven Hartland killing at multiplay.co.uk
Wed Aug 31 17:03:50 GMT 2005


When running a large rsync on one of our machines here it
constantly ditches and reboots leaving no traces in the logs
or anything. It looks like it could be a driver error but with
no crash log or panic message to go on I dont know where
to start.

The machine is running 5.4-RELEASE-p2 and the latest
driver set downloaded and compiled locally.

The only error I have to go on is the errors displayed in
the ssh session running the rsync.
35111 files to consider
rsync: readdir(games/fps/sof2/server): Input/output error (5)
rsync: readdir(games/fps/soldner): Input/output error (5)
...
...
rsync: mkstemp "/usr/home/ftp/pub/apps/3dmark/win32/.3DMark03.exe.NhcgGA" failed: Input/output error (5)
rsync: connection unexpectedly closed (1667283 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(365)
rsync: connection unexpectedly closed (1667263 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(365)
Segmentation fault
root at backup1>


I've tried running with witness enabled but it fails to boot with a message
about hpt_lock. I also tried originally with the default hptmv driver and no joy.

When it crashes it takes the RAID5 with it always dropping the same
disk. I've replaced the cable, disk and even plugged the disk direct
to the raid controller on a different channel to eliminate the supermicro
hotswap bay the disks are mounted in and still no changes the same disk
always gets dropped.

So the question is what can I try to get more info on what's happening?

[dmesg]
Aug 31 17:56:28 backup1 syslogd: kernel boot file is /boot/kernel/kernel
Aug 31 17:56:28 backup1 kernel: Copyright (c) 1992-2005 The FreeBSD Project.
Aug 31 17:56:28 backup1 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Aug 31 17:56:28 backup1 kernel: The Regents of the University of California. All rights reserved.
Aug 31 17:56:28 backup1 kernel: FreeBSD 5.4-RELEASE-p2 #6: Thu Jun 23 00:23:54 UTC 2005
Aug 31 17:56:28 backup1 kernel: root at backup1:/.usr/i386/src/sys/i386/compile/MPUK_SMP_200HZ
Aug 31 17:56:28 backup1 kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
Aug 31 17:56:28 backup1 kernel: CPU: AMD Opteron(tm) Processor 244 (1794.41-MHz 686-class CPU)
Aug 31 17:56:28 backup1 kernel: Origin = "AuthenticAMD"  Id = 0xf5a  Stepping = 10
Aug 31 17:56:28 backup1 kernel: 
Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
Aug 31 17:56:28 backup1 kernel: AMD Features=0xe0500000<NX,AMIE,LM,DSP,3DNow!>
Aug 31 17:56:28 backup1 kernel: AMD Features=0xe0500000<NX,AMIE,LM,DSP,3DNow!>
Aug 31 17:56:28 backup1 kernel: real memory  = 2146893824 (2047 MB)
Aug 31 17:56:28 backup1 kernel: avail memory = 2099625984 (2002 MB)
Aug 31 17:56:28 backup1 kernel: ACPI APIC Table: <PTLTD     APIC  >
Aug 31 17:56:28 backup1 kernel: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
Aug 31 17:56:28 backup1 kernel: cpu0 (BSP): APIC ID:  0
Aug 31 17:56:28 backup1 kernel: cpu1 (AP): APIC ID:  1
Aug 31 17:56:28 backup1 kernel: MADT: Forcing active-low polarity and level trigger for SCI
Aug 31 17:56:28 backup1 kernel: ioapic0 <Version 1.1> irqs 0-23 on motherboard
Aug 31 17:56:28 backup1 kernel: ioapic1 <Version 1.1> irqs 24-27 on motherboard
Aug 31 17:56:28 backup1 kernel: ioapic2 <Version 1.1> irqs 28-31 on motherboard
Aug 31 17:56:28 backup1 kernel: npx0: <math processor> on motherboard
Aug 31 17:56:28 backup1 kernel: npx0: INT 16 interface
Aug 31 17:56:28 backup1 kernel: acpi0: <PTLTD   XSDT> on motherboard
Aug 31 17:56:28 backup1 kernel: acpi0: Power Button (fixed)
Aug 31 17:56:28 backup1 kernel: acpi0: Sleep Button (fixed)
Aug 31 17:56:28 backup1 kernel: acpi_bus_number: can't get _ADR
Aug 31 17:56:28 backup1 last message repeated 2 times
Aug 31 17:56:28 backup1 kernel: unknown: I/O range not supported
Aug 31 17:56:28 backup1 kernel: unknown: I/O range not supported
Aug 31 17:56:28 backup1 kernel: ACPI-1304: *** Error: Method execution failed [\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xc30937a0), 
AE_AML_BUFFER_LIMIT
Aug 31 17:56:28 backup1 kernel: ACPI-0239: *** Error: Method execution failed [\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xc30937a0), 
AE_AML_BUFFER_LIMIT
Aug 31 17:56:28 backup1 kernel: can't fetch resources for \_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT
Aug 31 17:56:28 backup1 kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
Aug 31 17:56:28 backup1 kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
Aug 31 17:56:28 backup1 kernel: cpu0: <ACPI CPU> on acpi0
Aug 31 17:56:28 backup1 kernel: cpu1: <ACPI CPU> on acpi0
Aug 31 17:56:28 backup1 kernel: acpi_button0: <Power Button> on acpi0
Aug 31 17:56:28 backup1 kernel: pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
Aug 31 17:56:28 backup1 kernel: pci0: <ACPI PCI bus> on pcib0
Aug 31 17:56:28 backup1 kernel: pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
Aug 31 17:56:28 backup1 kernel: pci1: <ACPI PCI bus> on pcib1
Aug 31 17:56:28 backup1 kernel: pci1: <display, VGA> at device 0.0 (no driver attached)
Aug 31 17:56:28 backup1 kernel: pcib2: <ACPI PCI-PCI bridge> at device 6.0 on pci0
Aug 31 17:56:28 backup1 kernel: pci2: <ACPI PCI bus> on pcib2
Aug 31 17:56:28 backup1 kernel: bge0: <Broadcom BCM5705 Gigabit Ethernet, ASIC rev. 0x3003> mem 0xe8100000-0xe810ffff irq 19 at 
device 5.0 on pci2
Aug 31 17:56:28 backup1 kernel: miibus0: <MII bus> on bge0
Aug 31 17:56:28 backup1 kernel: brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus0
Aug 31 17:56:28 backup1 kernel: brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
Aug 31 17:56:28 backup1 kernel: bge0: Ethernet address: 00:0f:ea:7a:50:08
Aug 31 17:56:28 backup1 kernel: atapci0: <SiI 3114 SATA150 controller> port 
0x3000-0x300f,0x3010-0x3013,0x3018-0x301f,0x3014-0x3017,0x3020-0x3027 mem 0xe8110000-0xe81103ff irq 18 at device 6.0 on pci2
Aug 31 17:56:28 backup1 kernel: ata2: channel #0 on atapci0
Aug 31 17:56:28 backup1 kernel: ata3: channel #1 on atapci0
Aug 31 17:56:28 backup1 kernel: ata4: channel #2 on atapci0
Aug 31 17:56:28 backup1 kernel: ata5: channel #3 on atapci0
Aug 31 17:56:28 backup1 kernel: isab0: <PCI-ISA bridge> at device 7.0 on pci0
Aug 31 17:56:28 backup1 kernel: isa0: <ISA bus> on isab0
Aug 31 17:56:28 backup1 kernel: atapci1: <AMD 8111 UDMA133 controller> port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at 
device 7.1 on pci0
Aug 31 17:56:28 backup1 kernel: ata0: channel #0 on atapci1
Aug 31 17:56:28 backup1 kernel: ata1: channel #1 on atapci1
Aug 31 17:56:28 backup1 kernel: pci0: <bridge> at device 7.3 (no driver attached)
Aug 31 17:56:28 backup1 kernel: pcib3: <ACPI Host-PCI bridge> on acpi0
Aug 31 17:56:28 backup1 kernel: pci8: <ACPI PCI bus> on pcib3
Aug 31 17:56:28 backup1 kernel: pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci8
Aug 31 17:56:28 backup1 kernel: pci9: <ACPI PCI bus> on pcib4
Aug 31 17:56:28 backup1 kernel: bge1: <Broadcom BCM5703X Gigabit Ethernet, ASIC rev. 0x1100> mem 0xf8100000-0xf810ffff irq 25 at 
device 1.0 on pci9
Aug 31 17:56:28 backup1 kernel: bge1: Ethernet address: 00:10:18:0d:cc:da
Aug 31 17:56:28 backup1 kernel: pci8: <base peripheral, interrupt controller> at device 3.1 (no driver attached)
Aug 31 17:56:28 backup1 kernel: pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci8
Aug 31 17:56:28 backup1 kernel: pci14: <ACPI PCI bus> on pcib5
Aug 31 17:56:28 backup1 kernel: hptmv0: <RocketRAID 182x SATA Controller> mem 0xf8200000-0xf827ffff irq 30 at device 2.0 on pci14
Aug 31 17:56:28 backup1 kernel: RocketRAID 182x SATA Controller driver Version 1.1
Aug 31 17:56:28 backup1 kernel: RR182x [0,0]: channel started successfully
Aug 31 17:56:28 backup1 kernel: RR182x [0,1]: channel started successfully
Aug 31 17:56:28 backup1 kernel: RR182x [0,2]: channel started successfully
Aug 31 17:56:28 backup1 kernel: RR182x [0,4]: channel started successfully
Aug 31 17:56:28 backup1 kernel: RR182x [0,5]: channel started successfully
Aug 31 17:56:28 backup1 kernel: RR182x: RAID5 write-back enabled
Aug 31 17:56:28 backup1 kernel: pci8: <base peripheral, interrupt controller> at device 4.1 (no driver attached)
Aug 31 17:56:28 backup1 kernel: atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
Aug 31 17:56:28 backup1 kernel: atkbd0: <AT Keyboard> irq 1 on atkbdc0
Aug 31 17:56:28 backup1 kernel: fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
Aug 31 17:56:28 backup1 kernel: sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
Aug 31 17:56:28 backup1 kernel: sio0: type 16550A
Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0
Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled
Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0
Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled
Aug 31 17:56:28 backup1 kernel: orm0: <ISA Option ROMs> at iomem 0xcd000-0xd2fff,0xcb000-0xccfff,0xc0000-0xcafff on isa0
Aug 31 17:56:28 backup1 kernel: sc0: <System console> at flags 0x100 on isa0
Aug 31 17:56:28 backup1 kernel: sc0: VGA <16 virtual consoles, flags=0x300>
Aug 31 17:56:28 backup1 kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0
Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled
Aug 31 17:56:28 backup1 kernel: Timecounters tick every 5.000 msec
Aug 31 17:56:28 backup1 kernel: da0 at hptmv0 bus 0 target 0 lun 0
Aug 31 17:56:28 backup1 kernel: da0: <RR182x RAID 5 Array 3.00> Fixed Direct Access SCSI-0 device
Aug 31 17:56:28 backup1 kernel: da0: 1526216MB (3125691008 512 byte sectors: 255H 63S/T 194565C)
Aug 31 17:56:28 backup1 kernel: da1 at hptmv0 bus 0 target 1 lun 0
Aug 31 17:56:28 backup1 kernel: da1: <ST340083 2AS 3.03> Fixed Direct Access SCSI-0 device
Aug 31 17:56:28 backup1 kernel: da1: 381554MB (781422757 512 byte sectors: 255H 63S/T 48641C)
Aug 31 17:56:28 backup1 kernel: SMP: AP CPU #1 Launched!
Aug 31 17:56:28 backup1 kernel: Mounting root from ufs:/dev/da0s1d
Aug 31 17:56:28 backup1 kernel: WARNING: / was not properly dismounted
Aug 31 17:56:28 backup1 kernel: WARNING: R/W mount of / denied.  Filesystem is not clean - run fsck
[/dmesg]



================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137
or return the E.mail to postmaster at multiplay.co.uk.



More information about the freebsd-hackers mailing list