FreeBSD 5.3-[RELEASE-p1|STABLE] SMP crashes
Oliver Hartmann
ohartman at uni-mainz.de
Sat Nov 20 13:59:21 PST 2004
Dear Sirs.
First, please do not reply on this address, your reply will never reach me. Please contact me at ohartman at web.de. I can not post into this newsgroup via web.de due to SPAM exclusion of several web.de hosts.
As I reported very often in the past I have still massvie problems with SMP enabled on a FreeBSD 5.3-RELEASE-p1 __and__ FreeBSD 5.3-STABLE box. The crash is always of the same typus as I can 'watch' how the machine freezes and for some lucky moments I am able to switch to the console before the box dies definitely and watch what error message comes up.
This machine is a ASUS CUR-DLS maiboard, utilizing the RCC ServerWorks chipset, version 3 for Pentium 3 CPUs. At this moment I use two Intel 1GHz CPUs of the same stepping, but prior to this error report I used two CPUs with 866 Mhz and of different steppings, but it seems to make no difference.
I also tried a lot of kernel options, especially those which are supposed to be critical (means: I switched them off) and I used a GENERIC kernel for a while, but it makes no difference. The crash occurs while using a graphical console, Xorg X11 (version 4.7.0 as compiled from the ports), fvwm2 (develepmonet version, but crash occurs also with windowmaker so the GUI seems not to be an issue). I also tried to fix the problem by using built in fxp-NIC instead of the 64Bit Intel GBit LAN adapter (em0), but it is always the same.
I will append a mptable -verbose -dmesg output for your information and I will add the error message I receive.
Most time when the crash occurs I did a lot of graphical load (working on several TIFF files 200MB in size or with Mozilla/FireFox), but this may simply trigger or fasten up the problem.
Sometimes I can not get a 'systat -vmstat 1' output, calling vmstat in systat results in 'Alternate system clock has died. Reverting to ''pigs'' ...'. This happens very often in SMP, but not in UP.
I will add, that the UP system (SMP disabled by kern.smp.disable='1' in loader.conf) was up for nearly 13 days under same conditions when a SMP box crashes after several minutes, sevral hours.
This is the last console error I received:
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 00
fault virtual address = 0x1c
fault code = supervisor write, page not present
instruction pointer = 0x8:0xc062ac76
stack pointer = 0x10:0x4e2d7ac
frame pointer = 0x10:0xe4e2d7c4
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 44 (swi5: clock sio)
[thread 100042]
Stopped at vref +0x16: lock cmpxchgl %edx, 0x1c(%edx)
I am not a technical thug nor a kernel programmer. I tried to figure out what command got executed at address via recommended mn -n kernel|grep c062ac76
and it results in 'T vref'.
What is 'swi5: clock sio'? Is this problem hardware related? Why only in SMP? Others seem not to have problems with 5.3 and SMP, maybe this is very specific to me due to the RCC based mainboard I use (in the past I had a lot of problems with a TYAN 2500 mobo also based on ServerWorks chipset in conjunction with FreeBSD 4/5).
This is my mptable-output:===============================================================================
MPTable, version 2.0.15
looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009f000
searching CMOS 'top of mem' @ 0x0009ec00 (635K)
searching default 'top of mem' @ 0x0009fc00 (639K)
searching BIOS @ 0x000f0000
MP FPS found in BIOS @ physical addr: 0x000f5270
-------------------------------------------------------------------------------
MP Floating Pointer Structure:
location: BIOS
physical address: 0x000f5270
signature: '_MP_'
length: 16 bytes
version: 1.4
checksum: 0xe3
mode: Virtual Wire
-------------------------------------------------------------------------------
MP Config Table Header:
physical address: 0x000f4e60
signature: 'PCMP'
base table length: 276
version: 1.4
checksum: 0x0d
OEM ID: 'OEM00000'
Product ID: 'PROD00000000'
OEM table pointer: 0x00000000
OEM table size: 0
entry count: 26
local APIC address: 0xfee00000
extended table length: 124
extended table checksum: 198
-------------------------------------------------------------------------------
MP Config Base Table Entries:
--
Processors: APIC ID Version State Family Model Step Flags
3 0x11 BSP, usable 6 8 6 0x387fbff
0 0x11 AP, usable 6 8 6 0x387fbff
--
Bus: Bus ID Type
0 PCI
1 PCI
2 ISA
--
I/O APICs: APIC ID Version State Address
2 0x11 usable 0xfec00000
3 0x11 usable 0xfec01000
--
I/O Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN#
ExtINT conforms conforms 2 0 2 0
INT conforms conforms 2 1 2 1
INT conforms conforms 2 0 2 2
INT conforms conforms 2 3 2 3
INT conforms conforms 2 4 2 4
INT conforms conforms 2 6 2 6
INT conforms conforms 2 7 2 7
INT conforms conforms 2 8 2 8
INT conforms conforms 2 12 2 12
INT conforms conforms 2 13 2 13
INT conforms conforms 2 14 2 14
INT conforms conforms 2 15 2 15
INT active-lo level 0 15:A 3 14
INT active-lo level 2 9 2 9
INT active-lo level 1 3:A 3 6
INT active-lo level 1 5:A 3 8
INT active-lo level 1 5:B 3 9
--
Local Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN#
ExtINT active-hi edge 2 0 255 0
NMI active-hi edge 2 0 255 1
-------------------------------------------------------------------------------
MP Config Extended Table Entries:
--
System Address Space
bus ID: 0 address type: I/O address
address base: 0x0
address range: 0x10000
--
System Address Space
bus ID: 0 address type: memory address
address base: 0x40000000
address range: 0xbebe0000
--
System Address Space
bus ID: 0 address type: prefetch address
address base: 0xfebe0000
address range: 0xe9420000
--
System Address Space
bus ID: 0 address type: memory address
address base: 0xe8000000
address range: 0x18000000
--
System Address Space
bus ID: 0 address type: memory address
address base: 0xa0000
address range: 0x20000
--
Bus Heirarchy
bus ID: 2 bus info: 0x01 parent bus ID: 0
--
Compatibility Bus Address
bus ID: 0 address modifier: add
predefined range: 0x00000000
--
Compatibility Bus Address
bus ID: 0 address modifier: add
predefined range: 0x00000001
-------------------------------------------------------------------------------
dmesg output:
WARNING: /compat was not properly dismounted
WARNING: /homes was not properly dismounted
WARNING: /usr was not properly dismounted
WARNING: /usr/data was not properly dismounted
WARNING: /usr/local was not properly dismounted
WARNING: /usr/obj was not properly dismounted
/usr/obj: mount pending error: blocks 21296 files 928
/usr/obj: superblock summary recomputed
WARNING: /usr/scratch was not properly dismounted
WARNING: /usr/src was not properly dismounted
WARNING: /var was not properly dismounted
pflog0: promiscuous mode enabled
em0: Link is up 100 Mbps Full Duplex
em0: promiscuous mode enabled
em0: promiscuous mode disabled
===================================================================
More information about the freebsd-stable
mailing list