FreeBSD 5.3-[RELEASE-p1|STABLE] SMP crashes

Oliver Hartmann ohartman at uni-mainz.de
Sat Nov 20 13:59:21 PST 2004


Dear Sirs.
First, please do not reply on this address, your reply will never reach me. Please contact me at ohartman at web.de. I can not post into this newsgroup via web.de due to SPAM exclusion of several web.de hosts.

As I reported very often in the past I have still massvie problems with SMP enabled on a FreeBSD 5.3-RELEASE-p1 __and__ FreeBSD 5.3-STABLE box. The crash is always of the same typus as I can 'watch' how the machine freezes and for some lucky moments I am able to switch to the console before the box dies definitely and watch what error message comes up.

This machine is a ASUS CUR-DLS maiboard, utilizing the RCC ServerWorks chipset, version 3 for Pentium 3 CPUs. At this moment I use two Intel 1GHz CPUs of the same stepping, but prior to this error report I used two CPUs with 866 Mhz and of different steppings, but it seems to make no difference.

I also tried a lot of kernel options, especially those which are supposed to be critical (means: I switched them off) and I used a GENERIC kernel for a while, but it makes no difference. The crash occurs while using a graphical console, Xorg X11 (version 4.7.0 as compiled from the ports), fvwm2 (develepmonet version, but crash occurs also with windowmaker so the GUI seems not to be an issue). I also tried to fix the problem by using built in fxp-NIC instead of the 64Bit Intel GBit LAN adapter (em0), but it is always the same.

I will append a mptable -verbose -dmesg output for your information and I will add the error message I receive.
Most time when the crash occurs I did a lot of graphical load (working on several TIFF files 200MB in size or with Mozilla/FireFox), but this may simply trigger or fasten up the problem.

Sometimes I can not get a 'systat -vmstat 1' output, calling vmstat in systat results in 'Alternate system clock has died. Reverting to ''pigs'' ...'. This happens very often in SMP, but not in UP.

I will add, that the UP system (SMP disabled by kern.smp.disable='1' in loader.conf) was up for nearly 13 days under same conditions when a SMP box crashes after several minutes, sevral hours.

This is the last console error I received:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 00
fault virtual address  = 0x1c
fault code  =  supervisor write, page not present
instruction pointer  =  0x8:0xc062ac76
stack pointer  =  0x10:0x4e2d7ac
frame pointer  =  0x10:0xe4e2d7c4
code segment  = base 0x0, limit 0xfffff, type 0x1b
              = DPL 0, pres 1, def32 1, gran 1
processor eflags  = interrupt enabled, resume, IOPL = 0
current process = 44 (swi5: clock sio)
[thread 100042]
Stopped at      vref +0x16: lock cmpxchgl %edx, 0x1c(%edx)

I am not a technical thug nor a kernel programmer. I tried to figure out what command got executed at address via recommended mn -n kernel|grep c062ac76
and it results in 'T vref'.

What is 'swi5: clock sio'? Is this problem hardware related? Why only in SMP? Others seem not to have problems with 5.3 and SMP, maybe this is very specific to me due to the RCC based mainboard I use (in the past I had a lot of problems with a TYAN 2500 mobo also based on ServerWorks chipset in conjunction with FreeBSD 4/5). 

This is my mptable-output:===============================================================================

MPTable, version 2.0.15

 looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009f000
 searching CMOS 'top of mem' @ 0x0009ec00 (635K)
 searching default 'top of mem' @ 0x0009fc00 (639K)
 searching BIOS @ 0x000f0000

 MP FPS found in BIOS @ physical addr: 0x000f5270

-------------------------------------------------------------------------------

MP Floating Pointer Structure:

  location:                     BIOS
  physical address:             0x000f5270
  signature:                    '_MP_'
  length:                       16 bytes
  version:                      1.4
  checksum:                     0xe3
  mode:                         Virtual Wire

-------------------------------------------------------------------------------

MP Config Table Header:

  physical address:             0x000f4e60
  signature:                    'PCMP'
  base table length:            276
  version:                      1.4
  checksum:                     0x0d
  OEM ID:                       'OEM00000'
  Product ID:                   'PROD00000000'
  OEM table pointer:            0x00000000
  OEM table size:               0
  entry count:                  26
  local APIC address:           0xfee00000
  extended table length:        124
  extended table checksum:      198

-------------------------------------------------------------------------------

MP Config Base Table Entries:

--
Processors:     APIC ID Version State           Family  Model   Step    Flags
                 3       0x11    BSP, usable     6       8       6       0x387fbff
                 0       0x11    AP, usable      6       8       6       0x387fbff
--
Bus:            Bus ID  Type
                 0       PCI   
                 1       PCI   
                 2       ISA   
--
I/O APICs:      APIC ID Version State           Address
                 2       0x11    usable          0xfec00000
                 3       0x11    usable          0xfec01000
--
I/O Ints:       Type    Polarity    Trigger     Bus ID   IRQ    APIC ID PIN#
                ExtINT   conforms    conforms        2     0          2    0
                INT      conforms    conforms        2     1          2    1
                INT      conforms    conforms        2     0          2    2
                INT      conforms    conforms        2     3          2    3
                INT      conforms    conforms        2     4          2    4
                INT      conforms    conforms        2     6          2    6
                INT      conforms    conforms        2     7          2    7
                INT      conforms    conforms        2     8          2    8
                INT      conforms    conforms        2    12          2   12
                INT      conforms    conforms        2    13          2   13
                INT      conforms    conforms        2    14          2   14
                INT      conforms    conforms        2    15          2   15
                INT     active-lo       level        0  15:A          3   14
                INT     active-lo       level        2     9          2    9
                INT     active-lo       level        1   3:A          3    6
                INT     active-lo       level        1   5:A          3    8
                INT     active-lo       level        1   5:B          3    9
--
Local Ints:     Type    Polarity    Trigger     Bus ID   IRQ    APIC ID PIN#
                ExtINT  active-hi        edge        2     0        255    0
                NMI     active-hi        edge        2     0        255    1

-------------------------------------------------------------------------------

MP Config Extended Table Entries:

--
System Address Space
 bus ID: 0 address type: I/O address
 address base: 0x0
 address range: 0x10000
--
System Address Space
 bus ID: 0 address type: memory address
 address base: 0x40000000
 address range: 0xbebe0000
--
System Address Space
 bus ID: 0 address type: prefetch address
 address base: 0xfebe0000
 address range: 0xe9420000
--
System Address Space
 bus ID: 0 address type: memory address
 address base: 0xe8000000
 address range: 0x18000000
--
System Address Space
 bus ID: 0 address type: memory address
 address base: 0xa0000
 address range: 0x20000
--
Bus Heirarchy
 bus ID: 2 bus info: 0x01 parent bus ID: 0
--
Compatibility Bus Address
 bus ID: 0 address modifier: add
 predefined range: 0x00000000
--
Compatibility Bus Address
 bus ID: 0 address modifier: add
 predefined range: 0x00000001

-------------------------------------------------------------------------------

dmesg output:

WARNING: /compat was not properly dismounted
WARNING: /homes was not properly dismounted
WARNING: /usr was not properly dismounted
WARNING: /usr/data was not properly dismounted
WARNING: /usr/local was not properly dismounted
WARNING: /usr/obj was not properly dismounted
/usr/obj: mount pending error: blocks 21296 files 928
/usr/obj: superblock summary recomputed
WARNING: /usr/scratch was not properly dismounted
WARNING: /usr/src was not properly dismounted
WARNING: /var was not properly dismounted
pflog0: promiscuous mode enabled
em0: Link is up 100 Mbps Full Duplex
em0: promiscuous mode enabled
em0: promiscuous mode disabled

===================================================================




More information about the freebsd-stable mailing list