HP DL-360 Kernel Crashes on 6.2-R

Pat Wendorf dungeons at gmail.com
Mon May 7 16:59:49 UTC 2007


Hey all, I'm running 6.2R-p4 and I'm having some terrible problems with
stability on HP-DL360 G4 and G5 hardware.

We use these systems mostly for high volume postfix mail servers, and under
heavy postfix queue load, the server will crash (sometimes within minutes,
sometimes 12-14 hours after being put in load).  It also seems to also be
met with some degree of file system corruption when the box comes backup.

The first challenge I have is, I don't have local access to the box.  These
boxes are hosted with a managed hosting provider who does not understand
FreeBSD at all (yikes).  The second problem is, the type of crash seems to
very rarely produce a crash dump under /var/crash.  I'll provide the one
crash dump I've managed to get below.

To make things worse, this crash appears on 2 types of hardware (G4 and G5
boxes) and 3 types of raid controllers.  We're currently deprecating the G4
hardware, so I'll just send the dmesg and crash dump from the G5, but I can
assure you the symptoms are exactly the same on both types of boxes.

I'd also mention we've been using the exact same G4 hardware with 6.1-p11 in
production for almost a year now with zero crashes.  The crashes in the new
setup occur on SMP or Non SMP kernel compiles.

DMESG
--------------

Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-RELEASE-p4 #0: Mon May  7 09:33:50 EDT 2007
    root at localhost:/usr/src/sys/amd64/compile/SMP
ACPI APIC Table: <HP     00000083>
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU            5150  @ 2.66GHz (2666.78-MHz K8-class
CPU)
  Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6

Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>

Features2=0x4e3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,<b9>,CX16,<b14>,<b15>,<b18>>
  AMD Features=0x20000800<SYSCALL,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 2
real memory  = 2145746944 (2046 MB)
avail memory = 2060341248 (1964 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: <HP P58> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci9: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci9
pci10: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 0.0 on pci10
pci11: <ACPI PCI bus> on pcib3
em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
0x5000-0x501f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 16 at
device 0.0 on pci11
em0: Ethernet address: 00:17:08:7e:b6:ac
em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
0x5020-0x503f mem 0xfdfa0000-0xfdfbffff,0xfdf80000-0xfdf9ffff irq 17 at
device 0.1 on pci11
em1: Ethernet address: 00:17:08:7e:b6:ad
pcib4: <PCI-PCI bridge> at device 1.0 on pci10
pci14: <PCI bus> on pcib4
pcib5: <PCI-PCI bridge> at device 2.0 on pci10
pci15: <PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 0.3 on pci9
pci16: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> at device 3.0 on pci0
pci6: <ACPI PCI bus> on pcib7
ciss0: <HP Smart Array P400i> port 0x4000-0x40ff mem
0xfdd00000-0xfddfffff,0xfdcf0000-0xfdcf0fff irq 16 at device 0.0 on pci6
ciss0: [GIANT-LOCKED]
pcib8: <ACPI PCI-PCI bridge> at device 4.0 on pci0
pci19: <ACPI PCI bus> on pcib8
pcib9: <PCI-PCI bridge> at device 5.0 on pci0
pci22: <PCI bus> on pcib9
pcib10: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci2: <ACPI PCI bus> on pcib10
pcib11: <ACPI PCI-PCI bridge> at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib11
bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B2), v0.9.6> mem
0xf8000000-0xf9ffffff irq 18 at device 0.0 on pci3
bce0: ASIC ID 0x57081020; Revision (B2); PCI-X 64-bit 133MHz
miibus0: <MII bus> on bce0
brgphy0: <BCM5708C 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
bce0: Ethernet address: 00:1a:4b:df:bb:0e
pcib12: <ACPI PCI-PCI bridge> at device 7.0 on pci0
pci4: <ACPI PCI bus> on pcib12
pcib13: <ACPI PCI-PCI bridge> at device 0.0 on pci4
pci5: <ACPI PCI bus> on pcib13
bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B2), v0.9.6> mem
0xfa000000-0xfbffffff irq 19 at device 0.0 on pci5
bce1: ASIC ID 0x57081020; Revision (B2); PCI-X 64-bit 133MHz
miibus1: <MII bus> on bce1
brgphy1: <BCM5708C 10/100/1000baseTX PHY> on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
bce1: Ethernet address: 00:1a:4b:df:bb:06
uhci0: <UHCI (generic) USB controller> port 0x1000-0x101f irq 16 at device
29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0x1020-0x103f irq 17 at device
29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <UHCI (generic) USB controller> port 0x1040-0x105f irq 18 at device
29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <UHCI (generic) USB controller> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3: <UHCI (generic) USB controller> port 0x1060-0x107f irq 19 at device
29.3 on pci0
uhci3: [GIANT-LOCKED]
usb3: <UHCI (generic) USB controller> on uhci3
usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xf7df0000-0xf7df03ff irq 16
at device 29.7 on pci0
ehci0: [GIANT-LOCKED]
usb4: waiting for BIOS to give up control
usb4: EHCI version 1.0
usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
usb4: <EHCI (generic) USB 2.0 controller> on ehci0
usb4: USB revision 2.0
uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
pcib14: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci1: <ACPI PCI bus> on pcib14
pci1: <display, VGA> at device 3.0 (no driver attached)
pci1: <base peripheral> at device 4.0 (no driver attached)
pci1: <base peripheral> at device 4.2 (no driver attached)
uhci4: <UHCI (generic) USB controller> port 0x3800-0x381f irq 22 at device
4.4 on pci1
uhci4: [GIANT-LOCKED]
usb5: <UHCI (generic) USB controller> on uhci4
usb5: USB revision 1.0
uhub5: (0x103c) UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub5: 2 ports with 2 removable, self powered
pci1: <serial bus> at device 4.6 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel 63XXESB2 UDMA100 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x500-0x50f irq 17 at device 31.1 on
pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
acpi_tz0: <Thermal Zone> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xe6000-0xe7fff on isa0
ppc0: cannot reserve I/O port range
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ukbd0: HP Virtual Keyboard, rev 1.10/0.01, addr 2, iclass 3/1
kbd2 at ukbd0
ums0: HP Virtual Keyboard, rev 1.10/0.01, addr 2, iclass 3/1
ums0: 3 buttons.
uhub6: HP Virtual Hub, class 9/0, rev 1.10/0.01, addr 3
uhub6: 7 ports with 7 removable, self powered
Timecounters tick every 1.000 msec
acd0: CDRW <DW-224E-R/C.AC> at ata0-master UDMA33
SMP: AP CPU #1 Launched!
da0 at ciss0 bus 0 target 0 lun 0
da0: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device
da0: 135.168MB/s transfers
da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da1 at ciss0 bus 0 target 1 lun 0
da1: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device
da1: 135.168MB/s transfers
da1: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
Trying to mount root from ufs:/dev/da0s1a
WARNING: / was not properly dismounted
WARNING: /data was not properly dismounted
/data: mount pending error: blocks 52932 files 4411
WARNING: /tmp was not properly dismounted
WARNING: /usr was not properly dismounted
WARNING: /var was not properly dismounted
netsmb_dev: loaded
ukbd1: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1
kbd3 at ukbd1
ums1: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1
ums1: 5 buttons and Z dir.
bce0: link state changed to DOWN
bce0: link state changed to UP
bce0: link state changed to DOWN
bce0: link state changed to UP
ukbd1: at uhub1 port 1 (addr 2) disconnected
ukbd1: detached
ums1: at uhub1 port 1 (addr 2) disconnected
ums1: detached


CRASH DUMP
-----------------------

This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xd4
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xffffffff8041dae4
stack pointer           = 0x10:0xffffffffb1c73b10
frame pointer           = 0x10:0x4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 9 (thread taskq)
trap number             = 12
panic: page fault
Uptime: 1d14h56m53s
Dumping 2045 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 2046MB (523608 pages) 2030 2014 1998 1982 1966 1950 1934 1918
1902 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 1710 1694 1678
1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 1486 1470 1454 1438
1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 1214 1198
1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942
926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638
622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334
318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14

#0  doadump () at pcpu.h:172
172             __asm __volatile("movq %%gs:0,%0" : "=r" (td));

(kgdb) list *0xffffffff8041dae4
0xffffffff8041dae4 is in turnstile_setowner
(/usr/src/sys/kern/subr_turnstile.c:433).
428
429             mtx_assert(&td_contested_lock, MA_OWNED);
430             MPASS(owner->td_proc->p_magic == P_MAGIC);
431             MPASS(ts->ts_owner == NULL);
432             ts->ts_owner = owner;
433             LIST_INSERT_HEAD(&owner->td_contested, ts, ts_link);
434     }
435
436     /*
437      * Malloc a turnstile for a new thread, initialize it and return it.

(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0x0000000000000004 in ?? ()
#2  0xffffffff803f60d3 in boot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:409
#3  0xffffffff803f66d6 in panic (fmt=0xffffff007b883980 "\b*\213{") at
/usr/src/sys/kern/kern_shutdown.c:565
#4  0xffffffff806106f2 in trap_fatal (frame=0xffffff007b883980,
eva=18446742976270641672) at /usr/src/sys/amd64/amd64/trap.c:660
#5  0xffffffff80610c16 in trap (frame=
      {tf_rdi = -1097864339072, tf_rsi = 4, tf_rdx = -1097439102592, tf_rcx
= 3221225730, tf_r8 = -1097439102528, tf_r9 = -1097864339072, tf_rax = 2,
tf_rbx = -1097439102592, tf_rbp = 4, tf_r10 = -1097864339072, tf_r11 =
-1097439102592, tf_r12 = -1097439102592, tf_r13 = -1097864339072, tf_r14 =
-2138051040, tf_r15 = -1098519824376, tf_trapno = 12, tf_addr = 212,
tf_flags = -4295930868989109831, tf_err = 0, tf_rip = -2143167772, tf_cs =
8, tf_rflags = 65543, tf_rsp = -1312343256, tf_ss = 16})
    at /usr/src/sys/amd64/amd64/trap.c:238
#6  0xffffffff805fe2fb in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:168
#7  0xffffffff8041dae4 in turnstile_setowner (ts=0xffffff00622fa180,
owner=0x4) at /usr/src/sys/kern/subr_turnstile.c:432
#8  0xffffffff8041e0eb in turnstile_wait (lock=0xffffff003b1db808,
owner=0x4) at /usr/src/sys/kern/subr_turnstile.c:591
#9  0xffffffff803ec139 in _mtx_lock_sleep (m=0xffffff003b1db808,
tid=18446742976270449024, opts=2072525184, file=0xc0000102 <Address
0xc0000102 out of bounds>, line=2072525248)
    at /usr/src/sys/kern/kern_mutex.c:579
#10 0xffffffff80449193 in unp_gc (arg=0xffffff00622fa180, pending=4) at
/usr/src/sys/kern/uipc_usrreq.c:1714
#11 0xffffffff8041bfdd in taskqueue_run (queue=0xffffff0000bf8500) at
/usr/src/sys/kern/subr_taskqueue.c:257
#12 0xffffffff8041cbc5 in taskqueue_thread_loop (arg=0xffffff00622fa180) at
/usr/src/sys/kern/subr_taskqueue.c:376
#13 0xffffffff803dbf03 in fork_exit (callout=0xffffffff8041cb40
<taskqueue_thread_loop>, arg=0xffffffff808fdbb0, frame=0xffffffffb1c73c50)
at /usr/src/sys/kern/kern_fork.c:821
#14 0xffffffff805fe65e in fork_trampoline () at
/usr/src/sys/amd64/amd64/exception.S:394
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000001 in ?? ()
#18 0x0000000000000000 in ?? ()


More information about the freebsd-stable mailing list