kern/140338: FreeBSD 8.0 RC2 with vm.pmap.pg_ps_enabled=1 kernel panic with makeworld

Kai Gallasch gallasch at free.de
Fri Nov 6 17:30:02 UTC 2009


>Number:         140338
>Category:       kern
>Synopsis:       FreeBSD 8.0 RC2 with vm.pmap.pg_ps_enabled=1 kernel panic with makeworld
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Nov 06 17:30:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Kai Gallasch
>Release:        8.0 RC2 amd64
>Organization:
>Environment:
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-RC2 #0: Tue Nov  3 20:24:06 CET 2009
    root at sonnenkraft.free.de:/usr/obj/usr/src/sys/GENERIC amd64
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Quad-Core AMD Opteron(tm) Processor 2352 (2100.09-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f23  Stepping = 3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS>
  TSC: P-state invariant
real memory  = 21474836480 (20480 MB)
avail memory = 20701110272 (19742 MB)
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-31 on motherboard
ioapic2 <Version 1.1> irqs 32-47 on motherboard
kbd1 at kbdmux0
acpi0: <HP ProLiant> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x920-0x923 on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
pcib0: <ACPI Host-PCI bridge> on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x1000-0x10ff mem 0xe8000000-0xefffffff,0xf7ff0000-0xf7ffffff irq 44 at device 3.0 on pci0
pci0: <base peripheral> at device 4.0 (no driver attached)
pci0: <base peripheral> at device 4.2 (no driver attached)
uhci0: <UHCI (generic) USB controller> port 0x1800-0x181f irq 45 at device 4.4 on pci0
uhci0: [ITHREAD]
usbus0: <UHCI (generic) USB controller> on uhci0
pci0: <serial bus> at device 4.6 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> at device 5.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 13.0 on pci1
pci2: <ACPI PCI bus> on pcib2
atapci0: <ServerWorks HT1000 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x500-0x50f at device 6.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
isab0: <PCI-ISA bridge> at device 6.2 on pci0
isa0: <ISA bus> on isab0
ohci0: <OHCI (generic) USB controller> port 0x1c00-0x1cff mem 0xf7ee0000-0xf7ee0fff irq 5 at device 7.0 on pci0
ohci0: [ITHREAD]
usbus1: <OHCI (generic) USB controller> on ohci0
ohci1: <OHCI (generic) USB controller> port 0x3000-0x30ff mem 0xf7ed0000-0xf7ed0fff irq 5 at device 7.1 on pci0
ohci1: [ITHREAD]
usbus2: <OHCI (generic) USB controller> on ohci1
ehci0: <EHCI (generic) USB 2.0 controller> port 0x3400-0x34ff mem 0xf7ec0000-0xf7ec0fff irq 5 at device 7.2 on pci0
ehci0: [ITHREAD]
usbus3: EHCI version 1.0
usbus3: <EHCI (generic) USB 2.0 controller> on ehci0
pcib3: <ACPI PCI-PCI bridge> irq 42 at device 15.0 on pci0
pci5: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> irq 38 at device 16.0 on pci0
pci8: <ACPI PCI bus> on pcib4
pcib5: <PCI-PCI bridge> irq 39 at device 17.0 on pci0
pci14: <PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> irq 40 at device 18.0 on pci0
pci11: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> irq 41 at device 19.0 on pci0
pci3: <ACPI PCI bus> on pcib7
pcib8: <PCI-PCI bridge> at device 0.0 on pci3
pci4: <PCI bus> on pcib8
bce0: <HP NC373i Multifunction Gigabit Server Adapter (B2)> mem 0xf8000000-0xf9ffffff irq 41 at device 0.0 on pci4
miibus0: <MII bus> on bce0
brgphy0: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bce0: Ethernet address: 00:1b:78:38:dd:02
bce0: [ITHREAD]
bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); B/C (1.9.6); Flags (MSI|MFW); MFW ()
pcib9: <ACPI Host-PCI bridge> on acpi0
pci64: <ACPI PCI bus> on pcib9
pcib10: <ACPI PCI-PCI bridge> irq 36 at device 15.0 on pci64
pci67: <ACPI PCI bus> on pcib10
pcib11: <ACPI PCI-PCI bridge> irq 32 at device 16.0 on pci64
pci70: <ACPI PCI bus> on pcib11
ciss0: <HP Smart Array P400> port 0x4000-0x40ff mem 0xfdf00000-0xfdffffff,0xfdef0000-0xfdef0fff irq 32 at device 0.0 on pci70
ciss0: PERFORMANT Transport
ciss0: [ITHREAD]
pcib12: <PCI-PCI bridge> irq 33 at device 17.0 on pci64
pci73: <PCI bus> on pcib12
pcib13: <ACPI PCI-PCI bridge> irq 34 at device 18.0 on pci64
pci65: <ACPI PCI bus> on pcib13
pcib14: <PCI-PCI bridge> at device 0.0 on pci65
pci66: <PCI bus> on pcib14
bce1: <HP NC373i Multifunction Gigabit Server Adapter (B2)> mem 0xfa000000-0xfbffffff irq 34 at device 0.0 on pci66
miibus1: <MII bus> on bce1
brgphy1: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bce1: Ethernet address: 00:1b:78:38:dd:00
bce1: [ITHREAD]
bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); B/C (1.9.6); Flags (MSI|MFW); MFW ()
pcib15: <PCI-PCI bridge> irq 35 at device 19.0 on pci64
pci74: <PCI bus> on pcib15
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model IntelliMouse, device ID 3
atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
uart0: console (9600,n,8,1)
cpu0: <ACPI CPU> on acpi0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xcefff,0xe5000-0xe6fff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
uart1: <Non-standard ns8250 class UART with FIFOs> at port 0x2f8-0x2ff irq 3 on isa0
uart1: [FILTER]
Timecounters tick every 1.000 msec
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 480Mbps High Speed USB v2.0
acd0: CDRW <TSSTcorpCDW/DVD TS-L462D/HG01> at ata0-master UDMA33
ugen0.1: <(0x103c)> at usbus0
uhub0: <(0x103c) UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <(0x1166)> at usbus1
uhub1: <(0x1166) OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <(0x1166)> at usbus2
uhub2: <(0x1166) OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <(0x1166)> at usbus3
uhub3: <(0x1166) EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
uhub1: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub0: 2 ports with 2 removable, self powered
ugen0.2: <HP> at usbus0
ukbd0: <Virtual Keyboard> on usbus0
kbd2 at ukbd0
ums0: <Virtual Mouse> on usbus0
ums0: 3 buttons and [XY] coordinates ID=0
uhub3: 4 ports with 4 removable, self powered
ugen0.3: <HP> at usbus0
uhub4: <Virtual Hub> on usbus0
ugen3.2: <vendor 0x04b4> at usbus3
uhub5: <vendor 0x04b4 product 0x6560, class 9/0, rev 2.00/0.0b, addr 2> on usbus3
uhub5: 4 ports with 4 removable, self powered
uhub4: 7 ports with 7 removable, self powered
da0 at ciss0 bus 0 target 0 lun 0
da0: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-5 device 
da0: 135.168MB/s transfers
da0: Command Queueing enabled
da0: 36863MB (75496320 512 byte sectors: 255H 32S/T 9252C)
da1 at ciss0 bus 0 target 1 lun 0
da1: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-5 device 
da1: 135.168MB/s transfers
da1: Command Queueing enabled
da1: 243098MB (497866080 512 byte sectors: 255H 32S/T 61013C)
da2 at ciss0 bus 0 target 2 lun 0
da2: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device 
da2: 135.168MB/s transfers
da2: Command Queueing enabled
da2: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da3 at ciss0 bus 0 target 3 lun 0
da3: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device 
da3: 135.168MB/s transfers
da3: Command Queueing enabled
da3: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da4 at ciss0 bus 0 target 4 lun 0
da4: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device 
da4: 135.168MB/s transfers
da4: Command Queueing enabled
da4: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da5 at ciss0 bus 0 target 5 lun 0
da5: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device 
da5: 135.168MB/s transfers
da5: Command Queueing enabled
da5: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
SMP: AP CPU #1 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #2 Launched!
WARNING: WITNESS option enabled, expect reduced performance.GEOM: da0: partition 3 does not start on a track boundary.

GEOM: da0: partition 3 does not end on a track boundary.
GEOM: da0: partition 2 does not start on a track boundary.
GEOM: da0: partition 2 does not end on a track boundary.
GEOM: da0: partition 1 does not start on a track boundary.
GEOM: da0: partition 1 does not end on a track boundary.
GEOM: da0s1: geometry does not match label (255h,63s != 255h,32s).
GEOM: da0s2: geometry does not match label (255h,63s != 255h,32s).
GEOM: da0s3: geometry does not match label (255h,63s != 255h,32s).
Trying to mount root from ufs:/dev/da0s1a
ZFS filesystem version 13
ZFS storage pool version 13
bce0: link state changed to UP

>Description:
I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago when 8.0 RC2 came out.

When I tried to do a make buildworld or make buildkernel the server
rebooted without any message left in the logs. The same happened
when building bigger ports (for example ruby18 or perl58)

After this I installed 7.2-STABLE on this same server and did a "make
buildworld" and "make buildkernel" which completed without any problem.

Then I installed 8.0-BETA4 (crashes also when doing makeworld)

Finally I reinstalled 8.0RC2-amd64 on the server again and build a 8.0RC2 debug kernel on another amd server for this crashing server. 

I also:

- ran several passes with diagnostic software from the server manufacturer
- reset BIOS settings to default
- upgraded BIOS to newest release
- booted server from 2 year old backup BIOS
- took out the only pair of RAM modules that was different from the
rest of the modules
- ran memtest86 on the server (no problems found)

The server kept on crashing under load, when running buildworld.
Although dumpdev + dumpdir were correctly defined, the server just rebooted without writing a crashdump!

- Running a makeworld in about 80% leads to a server crash without
the server writing a crashdump to dumpdir. The server just reboots..

- In about 20% of the cases makeworld gets stuck in a not terminating
process that eats up 100% cpu. This process cannot be killed. When
restarting makeworld the server then reboots again

- It makes no difference doing makeworld -j1 or -j8, result is the same

Finally, I followed a hint I got on the freebsd-current list and set vm.pmap.pg_ps_enabled=0 in /boot/loader.conf an rebooted. The problem was gone!

After successful buildworld and buildkernel I rebooted the server
again with commented out vm.pmap.pg_ps_enabled=0 and the problem
was there again. And then I set vm.pmap.pg_ps_enabled=0 again in loader.conf,
rebooted + make buildworld .. no problem.

Seems to be deterministic. With vm.pmap.pg_ps_enabled=1 the server
crashes without being able to write crashdumps to dumpdev. (at least on
this specific HP Proliant DL385G2 server with 20G RAM)




>How-To-Repeat:
Install FreeBSD 8.0 RC2 amd64 + Sources, do a makeworld.
>Fix:
Workaround: Setting vm.pmap.pg_ps_enabled=0 in loader.conf and reboot.

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list