FBSD 5.1-RELEASE-p2 crashes/SMP wont work

Hartmann, O. ohartman at klima.physik.uni-mainz.de
Mon Aug 11 02:20:46 PDT 2003


Since we upgraded our SMP server (TYAN Thunder 2500 based system) from FreeBSD 4.8
to FreeBSD 5.1-RELEASE the machine crashed sporadicaly while in heavy load or wont
start after recognition of the AMI Enterprise 1600 RAID controller!

Kernel of FBSD 5.1-RELEASE start in single user mode, 5.1-RELEASE-p2 doesn't!

At this moment, the only working kernel is a 5.1-CURRENT kernel from two weeks ago
(see dmesg output below).

FreeBSD 5.1-RELEASE worked for a while, but when samba started and under heavy load
the system crashes (I got no error message, sorry).

FreeBSD 5.1-RELEASE-p2 doesn't want to start anymore! The last line I see while kernel is
booting is this:

amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 245014MB (501788672 sectors) RAID 5 (optimal)

and it freezes forever.

Sometimes I see this message below the last line:

amr0: bad slot 2 completed


amr0: bad slot 15 completed

What does it mean? Is this something like a problem in IRQ routing?

normaly, after the RAID controler has been recognized, a message about the launched second CPU
shows up.

Using the most recent freeBSD 5.1-CURRENT stuff is impossible on our machine, it freezes completely after a while
or does a spontanous reboot (earlier versions did not!).

Is any help available?

Another couriosity is that kernels build with SCHED_ULE freeze much faster than those build with
SCHED_4BSD, but SCHED_ULE kernels seem to boot, while SCHED_4BSD kernels sometimes do not.

Tnaks a lot for your help.

This is dmesg of the running and obviously working kernel:

Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #2: Fri Jul 25 11:45:43 GMT 2003
    root at atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 868644587 Hz
CPU: Intel Pentium III (868.64-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
real memory  = 2147483648 (2048 MB)
avail memory = 2086006784 (1989 MB)
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): apic id:  1, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  0, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  3, version: 0x000f0011, at 0xfec01000
netsmb_dev: loaded
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcibios: BIOS version 2.10
Using $PIR table, 12 entries at 0xc00fdf00
pcib0: <Host to PCI bridge> at pcibus 0 on motherboard
pci0: <PCI bus> on pcib0
IOAPIC #1 intpin 13 -> irq 2
IOAPIC #1 intpin 12 -> irq 16
pcib1: <PCIBIOS PCI-PCI bridge> at device 0.1 on pci0
pci1: <PCI bus> on pcib1
IOAPIC #1 intpin 1 -> irq 17
pci1: <display, VGA> at device 0.0 (no driver attached)
sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0
sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking
sym0: open drain IRQ line driver, using on-chip SRAM
sym0: using LOAD/STORE-based firmware.
sym0: handling phase mismatch from SCRIPTS.
sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0
sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking
sym1: open drain IRQ line driver, using on-chip SRAM
sym1: using LOAD/STORE-based firmware.
sym1: handling phase mismatch from SCRIPTS.
isab0: <PCI-ISA bridge> port 0x500-0x50f at device 15.0 on pci0
isa0: <ISA bus> on isab0
pci0: <mass storage, ATA> at device 15.1 (no driver attached)
pcib2: <ServerWorks host to PCI bridge> at pcibus 2 on motherboard
pci2: <PCI bus> on pcib2
IOAPIC #1 intpin 8 -> irq 18
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.6.6> port 0xf0c0-0xf0ff mem 0xf7ee0000-0xf7efffff irq 18 at device 1.0 on pci2
em0:  Speed:N/A  Duplex:N/A
pcib3: <PCI-PCI bridge> at device 2.0 on pci2
pci3: <PCI bus> on pcib3
IOAPIC #1 intpin 11 -> irq 19
pcib4: <PCI-PCI bridge> at device 0.0 on pci3
pci4: <PCI bus> on pcib4
IOAPIC #1 intpin 10 -> irq 20
amr0: <LSILogic MegaRAID> mem 0xf0000000-0xf3ffffff irq 20 at device 0.0 on pci4
amr0: <LSILogic MegaRAID Enterprise 1600> Firmware G170, BIOS F316, 64MB RAM
pci3: <mass storage, SCSI> at device 1.0 (no driver attached)
pci3: <mass storage, SCSI> at device 2.0 (no driver attached)
orm0: <Option ROMs> at iomem 0xca000-0xcdfff,0xc0000-0xc9fff on isa0
fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse, device ID 3
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <8 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
unknown: <PNP0303> can't assign resources (port)
psmcpnp0: irq resource info is missing; assuming irq 12
unknown: <PNP0501> can't assign resources (port)
unknown: <PNP0501> can't assign resources (port)
unknown: <PNP0700> can't assign resources (port)
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
Timecounters tick every 1.000 msec
ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to deny, logging unlimited
DUMMYNET initialized (011031)
Waiting 5 seconds for SCSI devices to settle
(noperiph:sym0:0:-1:-1): SCSI BUS reset delivered.
(noperiph:sym1:0:-1:-1): SCSI BUS reset delivered.
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 245014MB (501788672 sectors) RAID 5 (optimal)
amr0: bad slot 2 completed
sa0 at sym1 bus 0 target 5 lun 0
sa0: <HP C5713A H910> Removable Sequential Access SCSI-2 device
sa0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit)
ch0 at sym1 bus 0 target 5 lun 1
ch0: <HP C5713A H910> Removable Changer SCSI-2 device
ch0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit)
ch0: 6 slots, 1 drive, 0 pickers, 0 portals
SMP: AP CPU #1 Launched!
cd0 at sym0 bus 0 target 3 lun 0
cd0: <TEAC CD-ROM CD-532S 1.0A> Removable CD-ROM SCSI-2 device
cd0: 20.000MB/s transfers (20.000MHz, offset 16)
cd0: Attempt to query device size failed: NOT READY, Medium not present
Mounting root from ufs:amr0s1a
setrootbyname failed
ffs_mountroot: can't find rootvp
Root mount failed: 6

Manual root filesystem specification:
  <fstype>:<device>  Mount <device> using filesystem <fstype>
                       eg. ufs:da0s1a
  ?                  List valid disk boot devices
  <empty line>       Abort manual input

mountroot> ufs:amrd0s1a
Mounting root from ufs:amrd0s1a
WARNING: /usr/local was not properly dismounted
WARNING: /usr/obj was not properly dismounted
WARNING: /usr/src was not properly dismounted
WARNING: /var was not properly dismounted
link_elf: symbol swapblist undefined
KLD linprocfs.ko: depends on linux - not available
em0: Link is up 1000 Mbps Full Duplex

O. Hartmann

ohartman at mail.physik.uni-mainz.de
Systemadministration des Institutes fuer Physik der Atmosphaere (IPA)
Johannes Gutenberg Universitaet Mainz
Becherweg 21
55099 Mainz

Tel: +496131/3924662 (Maschinenraum)
Tel: +496131/3924144 (Buero)
FAX: +496131/3923532

More information about the freebsd-questions mailing list