5.1-R-p2 crashes on SMP with AMI RAID and Intel 1000/Pro

Hartmann, O. ohartman at klima.physik.uni-mainz.de
Wed Aug 13 01:59:22 PDT 2003


Dear Sirs.

It seems to me a never ending story. We run a box with a TYAN Thunder
2500 Dual SMP mainboard, 2GB ECC Tyan certified memory, AMI Enterprise
1600 RAID adapter and additional Intel 1000/Pro server type (64 bit)
GBit LAN NIC. With FreeBSD 4.8 this was stable, but to achive this
state was really hard! It is a story similar to that what happend when
we changed towards FreeBSD 5.1-RELEASE-p2 on this machine.

It seems to be highly dependend in which PCI slot several cards are
attached, so I will report this here also.

Phenomenon:

After a while the machine was running, the SMP kernel reboots
spontanously. This is when heavy IO is done, compiling or, when in the
morning time our department gets up and our staff connects to the samba
server.

Dependend on which devices are switched on or off by BIOS, the kernel
freezes at the stage when the amr0 RAID got recognized. I can avoid this
by enabling the built in NIC (fxp0). I can force this by putting the em0
NIC into another slot, for instance in the one remaining 64BIT/66MHz
slot (which should be a separate bus).

This 'game' was identical to that I had with FreeBSD 4.X - 4.8 and I
found out, that putting an additional NIC into PCI slot No. 2 (counted
from AGP slot on) made things clear, but using both NICs together
(either additional fxp0 or the new em0) remains the systems completely
unstable.

In FreeBSD 5.1-RELEASE-p2 and especially in FreeBSD 5.1-CURRENT this
'gambling' seems to reach its climax. My kernel is built up with
SCHED_4BSD because SCHED_ULE and ADAPTIVE_MUTEXES crashes immediately
the same way as described (running a while, then coredumping or freeze
at the stage after the amr0-RAID showed up in the kernel boot messages,
see the dmesg output below).

I'm not an hardware expert, but all this wierd stuff looks like to me to be
a IRQ routing problem. I fiddled around with many hand-assigned IRQ configurations,
but nothing helped. Either the Intel 1000/Pro or the AMI RAID causing
problems in the TYAN Thunder 2500 SMP environment.

We have also a SMP machine with a similar hardware, based on an ASUS CV4X-D,
AMI Elite 1600 RAID controller and the same Intel em0 1GBit NIC. OS is
FreeBSD 4.8 and this system never had any problem!

I feel a little bit helpless this moment, because I think I tried every trick
and something seems to be wrong with the combination TYAN Thunder 2500 and FreeBSD
5.X SMP. It is also very courios, that a kernel without SMP/IO_APIC freezes after
booting at the same place (amr0 RAID recognition).

Is there any help outside?

I attach the kernel config file and the dmesg output. Please note: I disabled both
serial ports, the parallel port, sound and usb to get additional IRQs. But I have to
enable the built in NIC to get a bootable, but instable FreeBSD 5.1-R box.

====================================
DMESG output
====================================

Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.1-RELEASE-p2 #14: Wed Aug 13 09:47:00 CEST 2003
    root at atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0458000.
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 868644793 Hz
CPU: Intel Pentium III (868.64-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
  Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE>
real memory  = 2147483648 (2048 MB)
avail memory = 2085625856 (1989 MB)
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): apic id:  1, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  0, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  3, version: 0x000f0011, at 0xfec01000
netsmb_dev: loaded
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcibios: BIOS version 2.10
Using $PIR table, 12 entries at 0xc00fdf00
pcib0: <Host to PCI bridge> at pcibus 0 on motherboard
pci0: <PCI bus> on pcib0
IOAPIC #1 intpin 13 -> irq 2
IOAPIC #1 intpin 12 -> irq 16
IOAPIC #1 intpin 2 -> irq 17
IOAPIC #1 intpin 7 -> irq 18
pcib1: <PCI-PCI bridge> at device 0.1 on pci0
pci1: <PCI bus> on pcib1
IOAPIC #1 intpin 1 -> irq 19
pci1: <display, VGA> at device 0.0 (no driver attached)
sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0
sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking
sym0: open drain IRQ line driver, using on-chip SRAM
sym0: using LOAD/STORE-based firmware.
sym0: handling phase mismatch from SCRIPTS.
sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0
sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking
sym1: open drain IRQ line driver, using on-chip SRAM
sym1: using LOAD/STORE-based firmware.
sym1: handling phase mismatch from SCRIPTS.
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.5.31> port 0xfcc0-0xfcff mem 0xfeac0000-0xfeadffff irq 17 at device 4.0 on pci0
em0:  Speed:1000 Mbps  Duplex:Full
fxp0: <Intel 82557/8/9 EtherExpress Pro/100(B) Ethernet> port 0xfc40-0xfc7f mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 18 at device 7.0 on pci0
fxp0: Ethernet address 00:e0:81:00:f0:d7
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isab0: <PCI-ISA bridge> port 0x500-0x50f at device 15.0 on pci0
isa0: <ISA bus> on isab0
pci0: <mass storage, ATA> at device 15.1 (no driver attached)
pcib2: <ServerWorks host to PCI bridge> at pcibus 2 on motherboard
pci2: <PCI bus> on pcib2
pcib3: <PCI-PCI bridge> at device 2.0 on pci2
pci3: <PCI bus> on pcib3
IOAPIC #1 intpin 11 -> irq 20
IOAPIC #1 intpin 8 -> irq 21
pcib4: <PCI-PCI bridge> at device 0.0 on pci3
pci4: <PCI bus> on pcib4
IOAPIC #1 intpin 10 -> irq 22
amr0: <LSILogic MegaRAID> mem 0xf0000000-0xf3ffffff irq 22 at device 0.0 on pci4
amr0: <LSILogic MegaRAID Enterprise 1600> Firmware G170, BIOS F316, 64MB RAM
pci3: <mass storage, SCSI> at device 1.0 (no driver attached)
pci3: <mass storage, SCSI> at device 2.0 (no driver attached)
orm0: <Option ROMs> at iomem 0xca000-0xcdfff,0xc0000-0xc9fff on isa0
fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse, device ID 3
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <8 virtual consoles, flags=0x300>
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 8250 or not responding
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
ppc0: parallel port not found.
unknown: <PNP0303> can't assign resources (port)
psmcpnp0: irq resource info is missing; assuming irq 12
unknown: <PNP0700> can't assign resources (port)
ppc1: parallel port not found.
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
Timecounters tick every 1.000 msec
ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to deny, logging unlimited
DUMMYNET initialized (011031)
Waiting 5 seconds for SCSI devices to settle
(noperiph:sym0:0:-1:-1): SCSI BUS reset delivered.
(noperiph:sym1:0:-1:-1): SCSI BUS reset delivered.
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 245014MB (501788672 sectors) RAID 5 (optimal)

===> freezing here!

sa0 at sym1 bus 0 target 5 lun 0
sa0: <HP C5713A H910> Removable Sequential Access SCSI-2 device
sa0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit)
ch0 at sym1 bus 0 target 5 lun 1
ch0: <HP C5713A H910> Removable Changer SCSI-2 device
ch0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit)
ch0: 6 slots, 1 drive, 0 pickers, 0 portals
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/amrd0s1a
cd0 at sym0 bus 0 target 3 lun 0
cd0: <TEAC CD-ROM CD-532S 1.0A> Removable CD-ROM SCSI-2 device
cd0: 20.000MB/s transfers (20.000MHz, offset 16)
cd0: Attempt to query device size failed: NOT READY, Medium not present

========================
KERNEL config file
========================

machine		i386
cpu		I686_CPU
ident		ATMOS

options         SMP                     # Symmetric MultiProcessor Kernel
options         APIC_IO                 # Symmetric (APIC) I/O

maxusers	0

hints		"ATMOS.hints"		#Default places to look for devices.


#options		COMPAT_FREEBSD4
options 	SCHED_4BSD		#4BSD scheduler

#options		SCHED_ULE
#options		ADAPTIVE_MUTEXES

#options         PQ_CACHESIZE=256

options         CPU_ENABLE_SSE

options         CLK_USE_TSC_CALIBRATION
#options		HZ=1000

#makeoptions    CONF_CFLAGS=-fno-builtin
#options        MAXDSIZ=(1024UL*1024*1024)
#options        MAXSSIZ=(128UL*1024*1024)
#options        DFLDSIZ=(1024UL*1024*1024)

options         GEOM_AES
options         GEOM_APPLE
options         GEOM_BDE
options         GEOM_BSD
options         GEOM_GPT
options         GEOM_MBR
options         GEOM_PC98
options         GEOM_SUNLABEL
options         GEOM_VOL

options         ROOTDEVNAME=\"ufs:amrd0s1a\"

options 	INET			#InterNETworking
#options 	INET6			#IPv6 communications protocols
options 	FFS			#Berkeley Fast Filesystem
options 	SOFTUPDATES		#Enable FFS soft updates support
options 	UFS_ACL			#Support for access control lists
options 	UFS_DIRHASH		#Improve performance on big directories
options 	NFSCLIENT		#Network Filesystem Client
options 	NFSSERVER		#Network Filesystem Server
options 	MSDOSFS			#MSDOS Filesystem
options 	CD9660			#ISO 9660 Filesystem
options 	PROCFS			#Process filesystem (requires PSEUDOFS)
options 	PSEUDOFS		#Pseudo-filesystem framework
options 	COMPAT_43		#Compatible with BSD 4.3 [KEEP THIS!]
options 	SCSI_DELAY=5000		#Delay (in ms) before probing SCSI

options 	SYSVSHM			#SYSV-style shared memory
options 	SYSVMSG			#SYSV-style message queues
options 	SYSVSEM			#SYSV-style semaphores

options         NETSMB
options         NETSMBCRYPTO
options         LIBMCHAIN
options         LIBICONV

#options         WATCHDOG

options         NETGRAPH
#options        NETGRAPH_ASYNC
#options        NETGRAPH_BPF
#options        NETGRAPH_BRIDGE
#options        NETGRAPH_CISCO
#options        NETGRAPH_ECHO
#options        NETGRAPH_ETHER
#options        NETGRAPH_FRAME_RELAY
#options        NETGRAPH_GIF
#options        NETGRAPH_GIF_DEMUX
#options        NETGRAPH_HOLE
#options        NETGRAPH_IFACE
#options        NETGRAPH_IP_INPUT
#options        NETGRAPH_KSOCKET
#options        NETGRAPH_L2TP
#options        NETGRAPH_LMI
#options        NETGRAPH_MPPC_ENCRYPTION
#options        NETGRAPH_ONE2MANY
#options        NETGRAPH_PPP
#options        NETGRAPH_PPPOE
#options        NETGRAPH_PPTPGRE
#options        NETGRAPH_RFC1490
#options        NETGRAPH_SOCKET
#options        NETGRAPH_SPLIT
#options        NETGRAPH_TEE
#options        NETGRAPH_TTY
#options        NETGRAPH_UI
#options        NETGRAPH_VJC

options         MROUTING
options         IPFIREWALL
options         IPFIREWALL_VERBOSE
options         IPFIREWALL_FORWARD
#options        IPFIREWALL_VERBOSE_LIMIT=100
#options        IPFIREWALL_DEFAULT_TO_ACCEPT
#options        IPV6FIREWALL
#options        IPV6FIREWALL_VERBOSE
#options        IPV6FIREWALL_VERBOSE_LIMIT=100
#options        IPV6FIREWALL_DEFAULT_TO_ACCEPT
options         IPDIVERT
#options        IPFILTER
#options        IPFILTER_LOG
#options        IPFILTER_DEFAULT_BLOCK
options         IPSTEALTH

options         RANDOM_IP_ID

options         ACCEPT_FILTER_DATA
#options        ACCEPT_FILTER_HTTP

options         TCP_DROP_SYNFIN
options         DUMMYNET
#options        BRIDGE

options         QUOTA

options 	_KPOSIX_PRIORITY_SCHEDULING
options         P1003_1B_SEMAPHORES

#options        MAC
#options        MAC_BIBA
#options        MAC_BSDEXTENDED
#options        MAC_DEBUG
#options        MAC_IFOFF
#options        MAC_LOMAC
#options        MAC_MLS
#options        MAC_NONE
#options        MAC_PARTITION
#options        MAC_SEEOTHERUIDS
#options        MAC_TEST

options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev

device		isa
#options         AUTO_EOI_1

device		pci

device		agp

# Floppy drives
device		fdc

# SCSI Controllers
device		sym		# NCR/Symbios Logic (newer chipsets + those of `ncr')
#device		ahc

# SCSI peripherals
device		scbus		# SCSI bus (required)
device		ch		# SCSI media changers
device		da		# Direct Access (disks)
device		sa		# Sequential Access (tape etc)
device		cd		# CD
device		pass		# Passthrough device (direct SCSI access)
device		ses		# SCSI Environmental Services (and SAF-TE)


# RAID controllers
device		amr		# AMI MegaRAID


#options        CHANGER_MIN_BUSY_SECONDS=2
#options        CHANGER_MAX_BUSY_SECONDS=10

#options        SA_IO_TIMEOUT=4
#options        SA_SPACE_TIMEOUT=60
#options        SA_REWIND_TIMEOUT=(2*60)
#options        SA_ERASE_TIMEOUT=(4*60)
#options        SA_1FM_AT_EOD

#options        SCSI_PT_DEFAULT_TIMEOUT=60
options         SES_ENABLE_PASSTHROUGH


# atkbdc0 controls both the keyboard and the PS/2 mouse
device		atkbdc		# AT keyboard controller
device		atkbd		# AT keyboard
options         ATKBD_DFLT_KEYMAP
makeoptions     ATKBD_DFLT_KEYMAP=us.iso

device		psm		# PS/2 mouse

device		vga		# VGA video card driver

device		splash		# Splash screen and screen saver support

# syscons is the default console driver, resembling an SCO console
device          sc
options         MAXCONS=8

#options         SC_ALT_MOUSE_IMAGE
options         SC_DFLT_FONT
makeoptions     SC_DFLT_FONT=cp850

options         SC_DISABLE_DDBKEY
options         SC_DISABLE_REBOOT
options         SC_HISTORY_SIZE=512
#options        SC_MOUSE_CHAR=0x3
options         SC_PIXEL_MODE
options         SC_NORM_ATTR=(FG_GREEN|BG_BLACK)
options         SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN)
options         SC_KERNEL_CONS_ATTR=(FG_RED|BG_BLACK)
options         SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED)
#options        SC_CUT_SPACES2TABS
#options        SC_CUT_SEPCHARS=\"x09\"
#options        SC_TWOBUTTON_MOUSE
#options        SC_NO_CUTPASTE
#options        SC_NO_FONT_LOADING
#options        SC_NO_HISTORY
#options        SC_NO_SYSMOUSE
#options        SC_NO_SUSPEND_VTYSWITCH

device		npx

#device		pmtimer

#device		sio		# 8250, 16[45]50 based serial ports

# Parallel port
#device		ppc
#device		ppbus		# Parallel port bus (required)
#device		lpt		# Printer
#device		plip		# TCP/IP over parallel
#device		ppi		# Parallel port interface device
#device		vpo		# Requires scbus and da


device		miibus		# MII bus support
device		em
#device		fxp		# Intel EtherExpress PRO/100B (82557, 82558)

device		random		# Entropy device
device		loop		# Network loopback
device		ether		# Ethernet support
#device		tun		# Packet tunnel.
device		pty		# Pseudo-ttys (telnet etc)
#device		gif		# IPv6 and IPv4 tunneling
#device		faith		# IPv6-to-IPv4 relaying (translation)

device		bpf		# Berkeley packet filter


------------------


Thanks a lot for your help,

Oliver
--
MfG
O. Hartmann

ohartman at mail.physik.uni-mainz.de
------------------------------------------------------------------
Systemadministration des Institutes fuer Physik der Atmosphaere (IPA)
------------------------------------------------------------------
Johannes Gutenberg Universitaet Mainz
Becherweg 21
55099 Mainz

Tel: +496131/3924662 (Maschinenraum)
Tel: +496131/3924144 (Buero)
FAX: +496131/3923532


More information about the freebsd-smp mailing list