Repeatable crash with 5.4-p1-RELEASE and SMP

Palle Girgensohn girgen at pingpong.net
Sat Jun 4 01:48:08 GMT 2005


Hi!

This is very similar to Brendan White problem just reported here. My guess 
is it is the very same problem. I've reported the same problem on some 
occasions before (although I use amd64, so my postings are to 
amd64 at freebsd.org).

My system is also Dell 2850, dual CPUs, 3GB RAM, running amd64 FreeBSD 
5.4-p1. It is quite stable (but slow) when running without SMP. When SMP is 
on, it crashes within a few hours. High load, around 4. See my postings on 
amd64@ for many more details.

Anyway, I have managed to get an automatic reboot and a core dump. Giant 
leap for mankind :-) . It looks kind of partly overwritten, though. 
According to the Developer's handbook, the core should be saved *before* 
the swap partition is added to the system. I can easily verifying that this 
is not the case, the swap is "mounted" first. I once again raise the 
question if PR conf/73834 shouln't be addressed? Or perhaps my core dump is 
quite normal? Doesn't look like it. In rc.conf, I have:

# kernel crash dumps
dumpdev="/dev/amrd0s2b"
dumpdir="/misc/crash"


Here's the dump. Anything else I shall extract, please just ask.

# kgdb kernel.debug /misc/crash/vmcore.11
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".
#0  doadump () at pcpu.h:167
167             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:167
#1  0x0000000000000000 in ?? ()
#2  0xffffffff80341267 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:410
#3  0xffffffff80341ac6 in panic (fmt=0xffffff007b76d000 " «x{") at 
/usr/src/sys/kern/kern_shutdown.c:566
#4  0xffffffff804f0f52 in trap_fatal (frame=0xc, eva=18446742976269307904)
    at /usr/src/sys/amd64/amd64/trap.c:639
#5  0xffffffff804f11ef in trap_pfault (frame=0xffffffffb1d229b0, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:562
#6  0xffffffff804f1457 in trap (frame=
      {tf_rdi = -1097427517200, tf_rsi = -1097440243712, tf_rdx = 1056, 
tf_rcx = 0, tf_r8 = 0, tf_r9 = 0, tf_r
ax = 1056, tf_rbx = 0, tf_rbp = -1098069766144, tf_r10 = 4503599627366400, 
tf_r11 = 3392, tf_r12 = 4, tf_r13 =
 4, tf_r14 = -1099313881192, tf_r15 = -1097364452848, tf_trapno = 12, 
tf_addr = 136, tf_flags = -1099313881192
, tf_err = 0, tf_rip = -2144020582, tf_cs = 8, tf_rflags = 66050, tf_rsp = 
-1311626640, tf_ss = 0})
    at /usr/src/sys/amd64/amd64/trap.c:341
#7  0xffffffff804deb0b in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:171
#8  0xffffff007c3900f0 in ?? ()
#9  0xffffff007b76d000 in ?? ()
#10 0x0000000000000420 in ?? ()
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000420 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0xffffff0055f11000 in ?? ()
#17 0x000ffffffffff000 in ?? ()
#18 0x0000000000000d40 in ?? ()
#19 0x0000000000000004 in ?? ()
#20 0x0000000000000004 in ?? ()
#21 0xffffff000bc95f98 in ?? ()
#22 0xffffff007ffb4a10 in ?? ()
#23 0x000000000000000c in ?? ()
#24 0x0000000000000088 in ?? ()
#25 0xffffff000bc95f98 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0xffffffff8034d79a in thread_fini (mem=0x0, size=0) at 
/usr/src/sys/kern/kern_thread.c:271
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000001 in ?? ()
#30 0xffffff007ffb4a00 in ?? ()
#31 0xffffff0055f11f98 in ?? ()
#32 0xffffffff804d46ff in zone_drain (zone=0x8) at 
/usr/src/sys/vm/uma_core.c:749
#33 0xffffffff804d22b6 in zone_foreach (zfunc=0xffffffff804d4530 
<zone_drain>)
    at /usr/src/sys/vm/uma_core.c:1494
#34 0xffffffff804d5ec9 in uma_reclaim () at /usr/src/sys/vm/uma_core.c:2623
#35 0xffffffff804cfcac in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:674
#36 0xffffffff8032805c in fork_exit (callout=0xffffffff804cf6b0 
<vm_pageout>, arg=0x0,
    frame=0xffffffffb1d22c50) at /usr/src/sys/kern/kern_fork.c:791
#37 0xffffffff804ded0e in fork_trampoline () at 
/usr/src/sys/amd64/amd64/exception.S:296
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000001 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000000 in ?? ()
#45 0x0000000000000000 in ?? ()
#46 0x0000000000000000 in ?? ()
#47 0x0000000000000000 in ?? ()
#48 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---
#49 0x0000000000000000 in ?? ()
#50 0x0000000000000000 in ?? ()
#51 0x0000000000000000 in ?? ()
#52 0x0000000000000000 in ?? ()
#53 0x0000000000000000 in ?? ()
#54 0x0000000000000000 in ?? ()
#55 0x0000000000000000 in ?? ()
#56 0x0000000000000000 in ?? ()
#57 0x0000000000000000 in ?? ()
#58 0x0000000000000000 in ?? ()
#59 0x0000000000000000 in ?? ()
#60 0x0000000000000000 in ?? ()
#61 0x0000000000000000 in ?? ()
#62 0x0000000000000000 in ?? ()
#63 0x0000000000000000 in ?? ()
#64 0x0000000000000000 in ?? ()
#65 0x0000000000000000 in ?? ()
#66 0x0000000000000000 in ?? ()
#67 0x0000000000000000 in ?? ()
#68 0x0000000000000000 in ?? ()
#69 0x0000000000000000 in ?? ()
#70 0x000000000095d000 in ?? ()
#71 0xffffffffb1d229b0 in ?? ()
#72 0x0000000000000104 in ?? ()
#73 0x0000000000000000 in ?? ()
#74 0xffffff007b78aba0 in ?? ()
#75 0xffffff007b7af280 in ?? ()
#76 0xffffffffb1d226e8 in ?? ()
#77 0xffffff007b76d000 in ?? ()
#78 0xffffffff80355d5c in sched_switch (td=0x0, newtd=0x0, flags=1) at 
/usr/src/sys/kern/sched_4bsd.c:881
#79 0x0000000000000000 in ?? ()
#80 0x0000000000000000 in ?? ()
#81 0x0000000000000000 in ?? ()
#82 0x0000000000000000 in ?? ()
#83 0x0000000000000000 in ?? ()
#84 0x0000000000000000 in ?? ()
#85 0x0000000000000000 in ?? ()
#86 0x0000000000000000 in ?? ()
#87 0x0000000000000000 in ?? ()
#88 0x0000000000000000 in ?? ()
#89 0x0000000000000000 in ?? ()
#90 0x0000000000000000 in ?? ()
#91 0x0000000000000000 in ?? ()
#92 0x0000000000000000 in ?? ()
#93 0x0000000000000000 in ?? ()
#94 0x0000000000000000 in ?? ()
#95 0x0000000000000000 in ?? ()
#96 0x0000000000000000 in ?? ()
#97 0x0000000000000000 in ?? ()
#98 0x0000000000000000 in ?? ()
#99 0x0000000000000000 in ?? ()
#100 0x0000000000000000 in ?? ()
#101 0x0000000000000000 in ?? ()
#102 0x0000000000000000 in ?? ()
#103 0x0000000000000000 in ?? ()
#104 0x0000000000000000 in ?? ()
#105 0x0000000000000000 in ?? ()
#106 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---
#107 0x0000000000000000 in ?? ()
#108 0x0000000000000000 in ?? ()
#109 0x0000000000000000 in ?? ()
#110 0x0000000000000000 in ?? ()
#111 0x0000000000000000 in ?? ()
#112 0x0000000000000000 in ?? ()
#113 0x0000000000000000 in ?? ()
#114 0x0000000000000000 in ?? ()
#115 0x0000000000000000 in ?? ()
#116 0x0000000000000000 in ?? ()
#117 0x0000000000000000 in ?? ()
#118 0x0000000000000000 in ?? ()
#119 0x0000000000000000 in ?? ()
#120 0x0000000000000000 in ?? ()
#121 0x0000000000000000 in ?? ()
#122 0x0000000000000000 in ?? ()
#123 0x0000000000000000 in ?? ()
#124 0x0000000000000000 in ?? ()
#125 0x0000000000000000 in ?? ()
#126 0x0000000000000000 in ?? ()
#127 0x0000000000000000 in ?? ()
#128 0x0000000000000000 in ?? ()
#129 0x0000000000000000 in ?? ()
#130 0x0000000000000000 in ?? ()
#131 0x0000000000000000 in ?? ()
#132 0x0000000000000000 in ?? ()
#133 0x0000000000000000 in ?? ()
#134 0x0000000000000000 in ?? ()
#135 0x0000000000000000 in ?? ()
#136 0x0000000000000000 in ?? ()
#137 0x0000000000000000 in ?? ()
#138 0x0000000000000000 in ?? ()
#139 0x0000000000000000 in ?? ()
#140 0x0000000000000000 in ?? ()
#141 0x0000000000000000 in ?? ()
#142 0x0000000000000000 in ?? ()
#143 0x0000000000000000 in ?? ()
#144 0x0000000000000000 in ?? ()
#145 0x0000000000000000 in ?? ()
#146 0x0000000000000000 in ?? ()
#147 0x0000000000000000 in ?? ()
#148 0x0000000000000000 in ?? ()
#149 0x0000000000000000 in ?? ()
#150 0x0000000000000000 in ?? ()
Cannot access memory at address 0xffffffffb1d23000


$ dmesg
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE-p1 #9: Fri Jun  3 22:26:49 CEST 2005
    girgen at melon.pingpong.net:/usr/obj/usr/src/sys/MELON
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.01-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0xf41  Stepping = 1
 
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x641d<SSE3,RSVD2>,MON,DS_CPL,CNTX-ID,CX16,<b14>>
  AMD Features=0x20100800<SYSCALL,NX,LM>
real memory  = 2147221504 (2047 MB)
avail memory = 2061885440 (1966 MB)
ACPI APIC Table: <DELL   PE BKC  >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  6
ioapic0: Changing APIC ID to 7
ioapic1: Changing APIC ID to 8
ioapic1: WARNING: intbase 32 != expected base 24
ioapic2: Changing APIC ID to 9
ioapic2: WARNING: intbase 64 != expected base 56
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 32-55 on motherboard
ioapic2 <Version 2.0> irqs 64-87 on motherboard
acpi0: <DELL PE BKC> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1
pci2: <ACPI PCI bus> on pcib2
amr0: <LSILogic MegaRAID 1.51> mem 
0xdfdc0000-0xdfdfffff,0xd80f0000-0xd80fffff irq 46 at device 14.0 on pci2
amr0: <LSILogic PERC 4e/Di> Firmware 516A, BIOS H418, 256MB RAM
pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci1
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 4.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 5.0 on pci0
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 0.0 on pci5
pci6: <ACPI PCI bus> on pcib6
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 
0xecc0-0xecff mem 0xdfae0000-0xdfafffff irq 64 at device 7.0 on pci6
em0: Ethernet address: 00:11:43:37:a4:9e
em0:  Speed:N/A  Duplex:N/A
pcib7: <ACPI PCI-PCI bridge> at device 0.2 on pci5
pci7: <ACPI PCI bus> on pcib7
em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 
0xdcc0-0xdcff mem 0xdf8e0000-0xdf8fffff irq 65 at device 8.0 on pci7
em1: Ethernet address: 00:11:43:37:a4:9f
em1:  Speed:N/A  Duplex:N/A
pcib8: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci8: <ACPI PCI bus> on pcib8
pci0: <serial bus, USB> at device 29.0 (no driver attached)
pci0: <serial bus, USB> at device 29.1 (no driver attached)
pci0: <serial bus, USB> at device 29.2 (no driver attached)
pci0: <serial bus, USB> at device 29.7 (no driver attached)
pcib9: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci9: <ACPI PCI bus> on pcib9
pci9: <display, VGA> at device 13.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH5 UDMA100 controller> port 
0xfc00-0xfc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on 
acpi0
sio0: type 16550A
orm0: <ISA Option ROMs> at iomem 
0xec000-0xeffff,0xce800-0xcf7ff,0xcb000-0xcbfff,0xc0000-0xcafff on isa0
ppc0: cannot reserve I/O port range
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
acd0: CDROM <TEAC CD-ROM CD-224E/K.9A> at ata0-master PIO4
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 139760MB (286228480 sectors) RAID 5 (optimal)
ses0 at amr0 bus 0 target 6 lun 0
ses0: <PE/PV 1x6 SCSI BP 1.0> Fixed Processor SCSI-2 device
ses0: SAF-TE Compliant Device
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/amrd0s2a
WARNING: / was not properly dismounted
WARNING: /misc was not properly dismounted
/misc: mount pending error: blocks 7368 files 5
WARNING: /usr was not properly dismounted
WARNING: /usr/local was not properly dismounted
/usr/local: mount pending error: blocks 204 files 1
WARNING: /var was not properly dismounted
/var: mount pending error: blocks 1344 files 86
WARNING: /var/spool/imap was not properly dismounted
em1: Link is up 100 Mbps Half Duplex
em0: Link is up 1000 Mbps Full Duplex



nothing at all in /etc/make.conf

generic kernel with SMP, removed USB since I got interrupt storm, and don't 
need it. Also removed FireWire. Diff against GENERIC:

$ diff -u GENERIC MELON
--- GENERIC     Tue Apr 12 15:57:01 2005
+++ MELON       Fri Jun  3 20:13:03 2005
@@ -20,7 +20,9 @@

 machine                amd64
 cpu            HAMMER
-ident          GENERIC
+ident          MELON
+
+makeoptions     DEBUG=-g

 # To statically compile in device wiring instead of /boot/device.hints
 #hints         "GENERIC.hints"         # Default places to look for 
devices.
@@ -64,10 +66,10 @@

 # Enabling NO_MIXED_MODE gives a performance improvement on some 
motherboards
 # but does not work with some boards (mostly nVidia chipset based).
-#options       NO_MIXED_MODE   # Don't penalize working chipsets
+options        NO_MIXED_MODE   # Don't penalize working chipsets

 # Linux 32-bit ABI support
-options        LINPROCFS               # Cannot be a module yet.
+#options       LINPROCFS               # Cannot be a module yet.

 # Bus support.  Do not remove isa, even if you have no isa slots
 device         acpi
@@ -234,29 +236,23 @@
 # Note that 'bpf' is required for DHCP.
 device         bpf             # Berkeley packet filter

-# USB support
-device         uhci            # UHCI PCI->USB interface
-device         ohci            # OHCI PCI->USB interface
-#device                ehci            # EHCI PCI->USB interface (USB 2.0)
-device         usb             # USB Bus (required)
-#device                udbp            # USB Double Bulk Pipe devices
-device         ugen            # Generic
-device         uhid            # "Human Interface Devices"
-device         ukbd            # Keyboard
-device         ulpt            # Printer
-device         umass           # Disks/Mass storage - Requires scbus and da
-device         ums             # Mouse
-device         urio            # Diamond Rio 500 MP3 player
-device         uscanner        # Scanners
-# USB Ethernet, requires mii
-device         aue             # ADMtek USB Ethernet
-device         axe             # ASIX Electronics USB Ethernet
-device         cdce            # Generic USB over Ethernet
-device         cue             # CATC USB Ethernet
-device         kue             # Kawasaki LSI USB Ethernet
-device         rue             # RealTek RTL8150 USB Ethernet
-
-# FireWire support
-device         firewire        # FireWire bus code
-device         sbp             # SCSI over FireWire (Requires scbus and da)
-device         fwe             # Ethernet over FireWire (non-standard!)
+# SMP
+options                SMP
+
+# SysV stuff
+# This provides support for System V shared memory.
+#
+options                SYSVSHM
+options                SYSVSEM
+options                SYSVMSG
+options                SHMMAXPGS=65536
+options                SEMMNI=40
+options                SEMMNS=240
+options                SEMUME=40
+options                SEMMNU=120
+
+# Debug stuff, temporary
+options                KDB
+options                KDB_TRACE
+options                KDB_UNATTENDED




More information about the freebsd-stable mailing list