(2) 5.1-R-p2 crashes on SMP with AMI RAID and Intel 1000/Pro

Colin Faber cfaber at fpsn.net
Wed Aug 20 11:18:36 PDT 2003


Hi,

I've got nearly the same setup in a Dell 1600SC with a gig of ram and a PERC4/Sc (LSI MegaRAID) card.

Dual 2.4GHz Xeon P4 HT CPU's and I've discovered I can lock up FreeBSD 5.1-RELEASE-p2 on command
simply by running something to quickly create and remove a directory. i.e.:

	perl -e 'for(my $i = 0 ; $i < 9999; $i++){ mkdir("abc"); rmdir("abc"); }'


Having machdep.cpu_idle_hlt = 0 makes no difference.


Kernel:
	FreeBSD 5.1-RELEASE-p2 FreeBSD 5.1-RELEASE-p2 #0: Mon Aug 11 21:40:47 MDT 2003 i386

Raid:
	amr0: <LSILogic MegaRAID> mem 0xfcd00000-0xfcd0ffff irq 3 at device 2.0 on pci1
	amrd0: <LSILogic MegaRAID logical drive> on amr0
	amrd0: 34556MB (70770688 sectors) RAID 5 (optimal)


I suspect that your and my problems are more driver related to the amr driver and may be exposing 
some other problem with in the kernels fs locking. I don't think (as others have suggested) that
your issue is power related, or related to the combination of hardware you're using. (Other than
the fact that you've got a MegaRAID card).

The exact crash message I'm seeing is:

panic: lockmgr: locking against myself
cpuid = 0; lapic.id 00000000
boot() called on cpu#0

syncing disks, buffers remaining... panic: ffs_copyonwrite: recursive call
cpuid = 0; lapic.id 00000000
boot() called on cpu#0
Uptime: 58s
pfs_vncache_unload(): 7 entries remaining
amr0: flushing cache...done
Terminate ACPI



Hartmann, O. wrote:

> Dear Sirs.
> 
> It seems to me a never ending story. We run a box with a TYAN Thunder
> 2500 Dual SMP mainboard, 2GB ECC Tyan certified memory, AMI Enterprise
> 1600 RAID adapter and additional Intel 1000/Pro server type (64 bit)
> GBit LAN NIC. With FreeBSD 4.8 this was stable, but to achive this
> state was really hard! It is a story similar to that what happend when
> we changed towards FreeBSD 5.1-RELEASE-p2 on this machine.
> 
> It seems to be highly dependend in which PCI slot several cards are
> attached, so I will report this here also.
> 
> Phenomenon:
> 
> After a while the machine was running, the SMP kernel reboots
> spontanously. This is when heavy IO is done, compiling or, when in the
> morning time our department gets up and our staff connects to the samba
> server.
> 
> Dependend on which devices are switched on or off by BIOS, the kernel
> freezes at the stage when the amr0 RAID got recognized. I can avoid this
> by enabling the built in NIC (fxp0). I can force this by putting the em0
> NIC into another slot, for instance in the one remaining 64BIT/66MHz
> slot (which should be a separate bus).
> 
> This 'game' was identical to that I had with FreeBSD 4.X - 4.8 and I
> found out, that putting an additional NIC into PCI slot No. 2 (counted
> from AGP slot on) made things clear, but using both NICs together
> (either additional fxp0 or the new em0) remains the systems completely
> unstable.
> 
> In FreeBSD 5.1-RELEASE-p2 and especially in FreeBSD 5.1-CURRENT this
> 'gambling' seems to reach its climax. My kernel is built up with
> SCHED_4BSD because SCHED_ULE and ADAPTIVE_MUTEXES crashes immediately
> the same way as described (running a while, then coredumping or freeze
> at the stage after the amr0-RAID showed up in the kernel boot messages,
> see the dmesg output below).
> 
> I'm not an hardware expert, but all this wierd stuff looks like to me to be
> a IRQ routing problem. I fiddled around with many hand-assigned IRQ configurations,
> but nothing helped. Either the Intel 1000/Pro or the AMI RAID causing
> problems in the TYAN Thunder 2500 SMP environment.
> 
> We have also a SMP machine with a similar hardware, based on an ASUS CV4X-D,
> AMI Elite 1600 RAID controller and the same Intel em0 1GBit NIC. OS is
> FreeBSD 4.8 and this system never had any problem!
> 
> I feel a little bit helpless this moment, because I think I tried every trick
> and something seems to be wrong with the combination TYAN Thunder 2500 and FreeBSD
> 5.X SMP. It is also very courios, that a kernel without SMP/IO_APIC freezes after
> booting at the same place (amr0 RAID recognition).
> 
> Is there any help outside?
> 
> I attach the kernel config file and the dmesg output. Please note: I disabled both
> serial ports, the parallel port, sound and usb to get additional IRQs. But I have to
> enable the built in NIC to get a bootable, but instable FreeBSD 5.1-R box.
> 
> ====================================
> DMESG output
> ====================================
> 
> Copyright (c) 1992-2003 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD 5.1-RELEASE-p2 #14: Wed Aug 13 09:47:00 CEST 2003
>     root at atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS
> Preloaded elf kernel "/boot/kernel/kernel" at 0xc0458000.
> Timecounter "i8254"  frequency 1193182 Hz
> Timecounter "TSC"  frequency 868644793 Hz
> CPU: Intel Pentium III (868.64-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
>   Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE>
> real memory  = 2147483648 (2048 MB)
> avail memory = 2085625856 (1989 MB)
> Programming 16 pins in IOAPIC #0
> IOAPIC #0 intpin 2 -> irq 0
> Programming 16 pins in IOAPIC #1
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>  cpu0 (BSP): apic id:  1, version: 0x00040011, at 0xfee00000
>  cpu1 (AP):  apic id:  0, version: 0x00040011, at 0xfee00000
>  io0 (APIC): apic id:  2, version: 0x000f0011, at 0xfec00000
>  io1 (APIC): apic id:  3, version: 0x000f0011, at 0xfec01000
> netsmb_dev: loaded
> Pentium Pro MTRR support enabled
> npx0: <math processor> on motherboard
> npx0: INT 16 interface
> pcibios: BIOS version 2.10
> Using $PIR table, 12 entries at 0xc00fdf00
> pcib0: <Host to PCI bridge> at pcibus 0 on motherboard
> pci0: <PCI bus> on pcib0
> IOAPIC #1 intpin 13 -> irq 2
> IOAPIC #1 intpin 12 -> irq 16
> IOAPIC #1 intpin 2 -> irq 17
> IOAPIC #1 intpin 7 -> irq 18
> pcib1: <PCI-PCI bridge> at device 0.1 on pci0
> pci1: <PCI bus> on pcib1
> IOAPIC #1 intpin 1 -> irq 19
> pci1: <display, VGA> at device 0.0 (no driver attached)
> sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0
> sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking
> sym0: open drain IRQ line driver, using on-chip SRAM
> sym0: using LOAD/STORE-based firmware.
> sym0: handling phase mismatch from SCRIPTS.
> sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0
> sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking
> sym1: open drain IRQ line driver, using on-chip SRAM
> sym1: using LOAD/STORE-based firmware.
> sym1: handling phase mismatch from SCRIPTS.
> em0: <Intel(R) PRO/1000 Network Connection, Version - 1.5.31> port 0xfcc0-0xfcff mem 0xfeac0000-0xfeadffff irq 17 at device 4.0 on pci0
> em0:  Speed:1000 Mbps  Duplex:Full
> fxp0: <Intel 82557/8/9 EtherExpress Pro/100(B) Ethernet> port 0xfc40-0xfc7f mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 18 at device 7.0 on pci0
> fxp0: Ethernet address 00:e0:81:00:f0:d7
> miibus0: <MII bus> on fxp0
> inphy0: <i82555 10/100 media interface> on miibus0
> inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> isab0: <PCI-ISA bridge> port 0x500-0x50f at device 15.0 on pci0
> isa0: <ISA bus> on isab0
> pci0: <mass storage, ATA> at device 15.1 (no driver attached)
> pcib2: <ServerWorks host to PCI bridge> at pcibus 2 on motherboard
> pci2: <PCI bus> on pcib2
> pcib3: <PCI-PCI bridge> at device 2.0 on pci2
> pci3: <PCI bus> on pcib3
> IOAPIC #1 intpin 11 -> irq 20
> IOAPIC #1 intpin 8 -> irq 21
> pcib4: <PCI-PCI bridge> at device 0.0 on pci3
> pci4: <PCI bus> on pcib4
> IOAPIC #1 intpin 10 -> irq 22
> amr0: <LSILogic MegaRAID> mem 0xf0000000-0xf3ffffff irq 22 at device 0.0 on pci4
> amr0: <LSILogic MegaRAID Enterprise 1600> Firmware G170, BIOS F316, 64MB RAM
> pci3: <mass storage, SCSI> at device 1.0 (no driver attached)
> pci3: <mass storage, SCSI> at device 2.0 (no driver attached)
> orm0: <Option ROMs> at iomem 0xca000-0xcdfff,0xc0000-0xc9fff on isa0
> fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
> fdc0: FIFO enabled, 8 bytes threshold
> fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
> atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
> kbd0 at atkbd0
> psm0: <PS/2 Mouse> irq 12 on atkbdc0
> psm0: model IntelliMouse, device ID 3
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <8 virtual consoles, flags=0x300>
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
> sio0: type 8250 or not responding
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> ppc0: parallel port not found.
> unknown: <PNP0303> can't assign resources (port)
> psmcpnp0: irq resource info is missing; assuming irq 12
> unknown: <PNP0700> can't assign resources (port)
> ppc1: parallel port not found.
> APIC_IO: Testing 8254 interrupt delivery
> APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2
> APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
> Timecounters tick every 1.000 msec
> ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to deny, logging unlimited
> DUMMYNET initialized (011031)
> Waiting 5 seconds for SCSI devices to settle
> (noperiph:sym0:0:-1:-1): SCSI BUS reset delivered.
> (noperiph:sym1:0:-1:-1): SCSI BUS reset delivered.
> amrd0: <LSILogic MegaRAID logical drive> on amr0
> amrd0: 245014MB (501788672 sectors) RAID 5 (optimal)
> 
> ===> freezing here!
> 
> sa0 at sym1 bus 0 target 5 lun 0
> sa0: <HP C5713A H910> Removable Sequential Access SCSI-2 device
> sa0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit)
> ch0 at sym1 bus 0 target 5 lun 1
> ch0: <HP C5713A H910> Removable Changer SCSI-2 device
> ch0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit)
> ch0: 6 slots, 1 drive, 0 pickers, 0 portals
> SMP: AP CPU #1 Launched!
> Mounting root from ufs:/dev/amrd0s1a
> cd0 at sym0 bus 0 target 3 lun 0
> cd0: <TEAC CD-ROM CD-532S 1.0A> Removable CD-ROM SCSI-2 device
> cd0: 20.000MB/s transfers (20.000MHz, offset 16)
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> 
> ========================
> KERNEL config file
> ========================
> 
> machine		i386
> cpu		I686_CPU
> ident		ATMOS
> 
> options         SMP                     # Symmetric MultiProcessor Kernel
> options         APIC_IO                 # Symmetric (APIC) I/O
> 
> maxusers	0
> 
> hints		"ATMOS.hints"		#Default places to look for devices.
> 
> 
> #options		COMPAT_FREEBSD4
> options 	SCHED_4BSD		#4BSD scheduler
> 
> #options		SCHED_ULE
> #options		ADAPTIVE_MUTEXES
> 
> #options         PQ_CACHESIZE=256
> 
> options         CPU_ENABLE_SSE
> 
> options         CLK_USE_TSC_CALIBRATION
> #options		HZ=1000
> 
> #makeoptions    CONF_CFLAGS=-fno-builtin
> #options        MAXDSIZ=(1024UL*1024*1024)
> #options        MAXSSIZ=(128UL*1024*1024)
> #options        DFLDSIZ=(1024UL*1024*1024)
> 
> options         GEOM_AES
> options         GEOM_APPLE
> options         GEOM_BDE
> options         GEOM_BSD
> options         GEOM_GPT
> options         GEOM_MBR
> options         GEOM_PC98
> options         GEOM_SUNLABEL
> options         GEOM_VOL
> 
> options         ROOTDEVNAME=\"ufs:amrd0s1a\"
> 
> options 	INET			#InterNETworking
> #options 	INET6			#IPv6 communications protocols
> options 	FFS			#Berkeley Fast Filesystem
> options 	SOFTUPDATES		#Enable FFS soft updates support
> options 	UFS_ACL			#Support for access control lists
> options 	UFS_DIRHASH		#Improve performance on big directories
> options 	NFSCLIENT		#Network Filesystem Client
> options 	NFSSERVER		#Network Filesystem Server
> options 	MSDOSFS			#MSDOS Filesystem
> options 	CD9660			#ISO 9660 Filesystem
> options 	PROCFS			#Process filesystem (requires PSEUDOFS)
> options 	PSEUDOFS		#Pseudo-filesystem framework
> options 	COMPAT_43		#Compatible with BSD 4.3 [KEEP THIS!]
> options 	SCSI_DELAY=5000		#Delay (in ms) before probing SCSI
> 
> options 	SYSVSHM			#SYSV-style shared memory
> options 	SYSVMSG			#SYSV-style message queues
> options 	SYSVSEM			#SYSV-style semaphores
> 
> options         NETSMB
> options         NETSMBCRYPTO
> options         LIBMCHAIN
> options         LIBICONV
> 
> #options         WATCHDOG
> 
> options         NETGRAPH
> #options        NETGRAPH_ASYNC
> #options        NETGRAPH_BPF
> #options        NETGRAPH_BRIDGE
> #options        NETGRAPH_CISCO
> #options        NETGRAPH_ECHO
> #options        NETGRAPH_ETHER
> #options        NETGRAPH_FRAME_RELAY
> #options        NETGRAPH_GIF
> #options        NETGRAPH_GIF_DEMUX
> #options        NETGRAPH_HOLE
> #options        NETGRAPH_IFACE
> #options        NETGRAPH_IP_INPUT
> #options        NETGRAPH_KSOCKET
> #options        NETGRAPH_L2TP
> #options        NETGRAPH_LMI
> #options        NETGRAPH_MPPC_ENCRYPTION
> #options        NETGRAPH_ONE2MANY
> #options        NETGRAPH_PPP
> #options        NETGRAPH_PPPOE
> #options        NETGRAPH_PPTPGRE
> #options        NETGRAPH_RFC1490
> #options        NETGRAPH_SOCKET
> #options        NETGRAPH_SPLIT
> #options        NETGRAPH_TEE
> #options        NETGRAPH_TTY
> #options        NETGRAPH_UI
> #options        NETGRAPH_VJC
> 
> options         MROUTING
> options         IPFIREWALL
> options         IPFIREWALL_VERBOSE
> options         IPFIREWALL_FORWARD
> #options        IPFIREWALL_VERBOSE_LIMIT=100
> #options        IPFIREWALL_DEFAULT_TO_ACCEPT
> #options        IPV6FIREWALL
> #options        IPV6FIREWALL_VERBOSE
> #options        IPV6FIREWALL_VERBOSE_LIMIT=100
> #options        IPV6FIREWALL_DEFAULT_TO_ACCEPT
> options         IPDIVERT
> #options        IPFILTER
> #options        IPFILTER_LOG
> #options        IPFILTER_DEFAULT_BLOCK
> options         IPSTEALTH
> 
> options         RANDOM_IP_ID
> 
> options         ACCEPT_FILTER_DATA
> #options        ACCEPT_FILTER_HTTP
> 
> options         TCP_DROP_SYNFIN
> options         DUMMYNET
> #options        BRIDGE
> 
> options         QUOTA
> 
> options 	_KPOSIX_PRIORITY_SCHEDULING
> options         P1003_1B_SEMAPHORES
> 
> #options        MAC
> #options        MAC_BIBA
> #options        MAC_BSDEXTENDED
> #options        MAC_DEBUG
> #options        MAC_IFOFF
> #options        MAC_LOMAC
> #options        MAC_MLS
> #options        MAC_NONE
> #options        MAC_PARTITION
> #options        MAC_SEEOTHERUIDS
> #options        MAC_TEST
> 
> options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
> 
> device		isa
> #options         AUTO_EOI_1
> 
> device		pci
> 
> device		agp
> 
> # Floppy drives
> device		fdc
> 
> # SCSI Controllers
> device		sym		# NCR/Symbios Logic (newer chipsets + those of `ncr')
> #device		ahc
> 
> # SCSI peripherals
> device		scbus		# SCSI bus (required)
> device		ch		# SCSI media changers
> device		da		# Direct Access (disks)
> device		sa		# Sequential Access (tape etc)
> device		cd		# CD
> device		pass		# Passthrough device (direct SCSI access)
> device		ses		# SCSI Environmental Services (and SAF-TE)
> 
> 
> # RAID controllers
> device		amr		# AMI MegaRAID
> 
> 
> #options        CHANGER_MIN_BUSY_SECONDS=2
> #options        CHANGER_MAX_BUSY_SECONDS=10
> 
> #options        SA_IO_TIMEOUT=4
> #options        SA_SPACE_TIMEOUT=60
> #options        SA_REWIND_TIMEOUT=(2*60)
> #options        SA_ERASE_TIMEOUT=(4*60)
> #options        SA_1FM_AT_EOD
> 
> #options        SCSI_PT_DEFAULT_TIMEOUT=60
> options         SES_ENABLE_PASSTHROUGH
> 
> 
> # atkbdc0 controls both the keyboard and the PS/2 mouse
> device		atkbdc		# AT keyboard controller
> device		atkbd		# AT keyboard
> options         ATKBD_DFLT_KEYMAP
> makeoptions     ATKBD_DFLT_KEYMAP=us.iso
> 
> device		psm		# PS/2 mouse
> 
> device		vga		# VGA video card driver
> 
> device		splash		# Splash screen and screen saver support
> 
> # syscons is the default console driver, resembling an SCO console
> device          sc
> options         MAXCONS=8
> 
> #options         SC_ALT_MOUSE_IMAGE
> options         SC_DFLT_FONT
> makeoptions     SC_DFLT_FONT=cp850
> 
> options         SC_DISABLE_DDBKEY
> options         SC_DISABLE_REBOOT
> options         SC_HISTORY_SIZE=512
> #options        SC_MOUSE_CHAR=0x3
> options         SC_PIXEL_MODE
> options         SC_NORM_ATTR=(FG_GREEN|BG_BLACK)
> options         SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN)
> options         SC_KERNEL_CONS_ATTR=(FG_RED|BG_BLACK)
> options         SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED)
> #options        SC_CUT_SPACES2TABS
> #options        SC_CUT_SEPCHARS=\"x09\"
> #options        SC_TWOBUTTON_MOUSE
> #options        SC_NO_CUTPASTE
> #options        SC_NO_FONT_LOADING
> #options        SC_NO_HISTORY
> #options        SC_NO_SYSMOUSE
> #options        SC_NO_SUSPEND_VTYSWITCH
> 
> device		npx
> 
> #device		pmtimer
> 
> #device		sio		# 8250, 16[45]50 based serial ports
> 
> # Parallel port
> #device		ppc
> #device		ppbus		# Parallel port bus (required)
> #device		lpt		# Printer
> #device		plip		# TCP/IP over parallel
> #device		ppi		# Parallel port interface device
> #device		vpo		# Requires scbus and da
> 
> 
> device		miibus		# MII bus support
> device		em
> #device		fxp		# Intel EtherExpress PRO/100B (82557, 82558)
> 
> device		random		# Entropy device
> device		loop		# Network loopback
> device		ether		# Ethernet support
> #device		tun		# Packet tunnel.
> device		pty		# Pseudo-ttys (telnet etc)
> #device		gif		# IPv6 and IPv4 tunneling
> #device		faith		# IPv6-to-IPv4 relaying (translation)
> 
> device		bpf		# Berkeley packet filter
> 
> 
> ------------------
> 
> 
> Thanks a lot for your help,
> 
> Oliver
> --
> MfG
> O. Hartmann
> 
> ohartman at mail.physik.uni-mainz.de
> ------------------------------------------------------------------
> Systemadministration des Institutes fuer Physik der Atmosphaere (IPA)
> ------------------------------------------------------------------
> Johannes Gutenberg Universitaet Mainz
> Becherweg 21
> 55099 Mainz
> 
> Tel: +496131/3924662 (Maschinenraum)
> Tel: +496131/3924144 (Buero)
> FAX: +496131/3923532
> _______________________________________________
> freebsd-smp at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-smp
> To unsubscribe, send any mail to "freebsd-smp-unsubscribe at freebsd.org"
> 
> 



More information about the freebsd-stable mailing list