4.9 SMP Stability?
Rick Updegrove
dislists at updegrove.net
Wed Apr 14 18:20:29 PDT 2004
Kris Kennaway wrote:
> First verify that
Obviously, I am going to have to change one thing at a time, wait for
the crash (and let the disks take the beating) or I will have no way to
know what exactly is happening.
So, I will start at the top and work my way down this list.
> * You have an up-to-date BIOS on the system. A lot of systems have
> buggy BIOSes, and this is frequently the cause of "mysterious crashes"
> especially for advanced features like SMP.
I am running HP 4.06.33 PL
at your request I will update to 4.06.43 PL
I will do this just as soon as I sent this reply, which has more
questions I need answered. Besides, I need to run the new BIOS with the
4.10-BETA kernel until it crashes to eliminate the BIOS as a suspect right?
> * You have not fiddled with options in the BIOS. Playing with things
> like memory timing and other BIOS features can cause crashes.
I have changed one setting which stopped the "locking up with no
reboooting".
See http://lists.freebsd.org/pipermail/freebsd-stable/2003-July/002230.html
I got an off-list reply which suggested I do the following:
I went into the BIOS and selected:
Configuration
-> PCI Slot Devices
-> PCI IRQ Locking
-> Routing Algorithm [Smart]
Ok I changed Routing Algorithm [Smart] to [Fixed] and got a scary
warning about data loss etc. but I hit Yes and saved as prompted and
rebooted.
> * The hardware is all in order, you don't have mismatched components
> like CPUs with different steppings, etc.
This may sound silly but how do I verify this?
(I have attached dmesg -a at the bottom of this email in case that helps)
> These three points hold *whether or not an older version of FreeBSD
> works for you*, because different versions of FreeBSD interact in
> different ways with the hardware, and a previously existing problem
> may suddenly leap out at you when you run a different version.
Sorry but to me the above paragraph is confusing. I don't agree with
what I think it says.
The hardware runs just fine with 4.8-STABLE so I don't think you can
convince me that my hardware is the cause of this problem.
> * you're not using out-of-date kernel modules, since in general they
> must be rebuilt whenever you update your kernel.
How do I verify this?
The proceedure I follow after once doing:
mkdir /root/kernels
cp GENERIC /root/kernels/MYKERNEL
is:
cp -Rp /etc /etc.old
cd /usr
rm -rf src/*
rm -rf obj/*
cd /usr/src
/usr/local/bin/cvsup -g -L 2 /etc/stable-supfile
cd /usr/src/sys/i386/conf
ln -s /root/kernels/MYKERNEL
/usr/sbin/config MYKERNEL
cd ../../compile/MYKERNEL
make depend
cd /usr/src
make -j4 buildworld
cd /usr/src
make buildkernel KERNCONF=MYKERNEL
make installkernel KERNCONF=MYKERNEL
make installworld
cd /dev
/bin/sh MAKEDEV all
cd /usr/src/release/sysinstall
make all install
shutdown -r now
Am I missing anything specific?
If you just point me to the handbook I will refer back to my question:
"Am I missing anything specific?"
> You said the machine panicked.
I said the machine reboots without any warning and without leaving
anything useful in any of the logs.
> When you encounter a panic, the useful
> thing to do is to obtain a debugging traceback, as described in the
> developers handbook.
>
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
>
> Your bug report will be more useful the more relevant details you can
> provide about it. For example, provide a copy of boot -v, and details
> of what you are doing to provoke the problem, what you have tried to
> work around it, and any other partial results you might have.
boot -v
-bash: boot: command not found
Again, I am doing nothing to provoke the problem. I check the uptime
frm time to time and I notice that it has rebooted.
So far I have been unable to obtain any useful information by following
the handbook. However, I think I have made some progress in that area.
#/etc/rc.conf
dumpdev=/dev/amrd0s1b
savecore=YES
dumpdir="/var/crash"
So, hopefully when the machine crashes, after the BIOS update, along
with the above changes to rc.conf and the debugging traceback (if I can
obtain one) will help.
> After all this, there's no guarantee that one of the volunteer
> developers will be able to jump on board to try to solve your problem
> straight away [1]. Debugging this kind of thing typically takes time,
> so if you don't have it to spare then you'll just have to put on a
> happy face and accept that you can't put in the work needed to track
> newer versions of FreeBSD on your machine.
Yep I know but I feel like I must try anyway. :)
> Kris
>
> [1] of course, you always have the option to pay an expert to
> investigate the problem.
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.10-BETA #0: Tue Apr 13 21:49:08 PDT 2004
root at govmail.ca.gov:/usr/obj/usr/src/sys/SMP
Timecounter "i8254" frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (499.15-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x673 Stepping = 3
Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE>
real memory = 536870912 (524288K bytes)
avail memory = 519507968 (507332K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
FreeBSD/SMP: Multiprocessor motherboard: 2 CPUs
cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000
cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000
io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000
Preloaded elf kernel "kernel" at 0xc0329000.
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 14 entries at 0xc00fdee0
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82443BX host to PCI bridge (AGP disabled)> on motherboard
IOAPIC #0 intpin 19 -> irq 2
IOAPIC #0 intpin 17 -> irq 16
pci0: <PCI bus> on pcib0
isab0: <Intel 82371AB PCI to ISA bridge> at device 4.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 ATA33 controller> port 0xfcd0-0xfcdf at device 4.1
on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 4.2 irq 2
Timecounter "PIIX" frequency 3579545 Hz
chip1: <Intel 82371AB Power management controller> port 0x2180-0x218f at
device 4.3 on pci0
pcib1: <PCI to PCI bridge (vendor=8086 device=0960)> at device 7.0 on pci0
IOAPIC #0 intpin 16 -> irq 17
pci1: <PCI bus> on pcib1
ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0xe800-0xe8ff mem
0xfebfe000-0xfebfefff irq 17 at device 4.0 on pci1
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
pci1: <unknown card> (vendor=0x1000, dev=0x000c) at 7.0 irq 18
amr0: <LSILogic MegaRAID> mem 0xf0000000-0xf7ffffff irq 16 at device 7.1
on pci0
amr0: <Integrated HP NetRAID (T5)> Firmware D.02.05, BIOS B.01.04, 16MB RAM
pcib2: <DEC 21152 PCI-PCI bridge> at device 8.0 on pci0
pci2: <PCI bus> on pcib2
fxp0: <Intel 82558 Pro/100 Ethernet> port 0xdce0-0xdcff mem
0xfe900000-0xfe9fffff,0xefffe000-0xefffefff irq 16 at device 2.0 on pci2
fxp0: Ethernet address 00:90:27:b7:09:76
inphy0: <i82555 10/100 media interface> on miibus0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pci0: <unknown card> (vendor=0x103c, dev=0x10c1) at 11.0
pci0: <Cirrus Logic GD5446 SVGA controller> at 13.0
orm0: <Option ROMs> at iomem
0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xc9000-0xc97ff on isa0
pmtimer0 on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
ppc0: parallel port not found.
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0
intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
ata0-slave: ATAPI identify retries exceeded
acd0: CDROM <CD-532E-B> at ata0-master PIO4
Waiting 15 seconds for SCSI devices to settle
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 34708MB (71081984 sectors) RAID 5 (optimal)
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/amrd0s1a
dumpon: crash dumps to /dev/amrd0s1b (133, 131073)
swapon: adding /dev/amrd0s1b as swap device
Automatic boot in progress...
/dev/amrd0s1a:
FILESYSTEM CLEAN; SKIPPING CHECKS
/dev/amrd0s1a:
clean, 17512 free
(232 frags, 2160 blocks, 0.4% fragmentation)
/dev/amrd0s1f:
FILESYSTEM CLEAN; SKIPPING CHECKS
/dev/amrd0s1f:
clean, 108490 free
(322 frags, 13521 blocks, 0.2% fragmentation)
/dev/amrd0s1g:
FILESYSTEM CLEAN; SKIPPING CHECKS
/dev/amrd0s1g:
clean, 11804820 free
(392020 frags, 1426600 blocks, 2.4% fragmentation)
/dev/amrd0s1e:
FILESYSTEM CLEAN; SKIPPING CHECKS
/dev/amrd0s1e:
clean, 314499 free
(21563 frags, 36617 blocks, 4.2% fragmentation)
Doing initial network setup:
hostname
.
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 134.186.104.10 netmask 0xffffff00 broadcast 134.186.104.255
ether 00:90:27:b7:09:76
media: Ethernet 100baseTX <full-duplex>
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
inet 127.0.0.1 netmask 0xff000000
add net default: gateway 134.186.104.62
Additional routing options:
TCP keepalive=YES
.
Routing daemons:
.
Additional daemons:
syslogd
.
Checking for core dump:
savecore: no core dump
Doing additional network setup:
.
Starting final network daemons:
.
ELF ldconfig path: /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting standard daemons:
cron
sshd
.
Initial rc.i386 initialization:
.
Configuring syscons:
blanktime
.
Additional ABI support:
.
Starting local daemons:
starting svscan in /service
[1] 96
.
Local package initialization:
[Wed Apr 14 17:53:50 2004] [warn] Loaded DSO libexec/apache/libphp4.so
uses plain Apache 1.3 API, this module might crash under EAPI! (please
recompile it with -DEAPI)
apache
Starting clamd
mysqld
(skipping samba.sh, not executable)
Starting spamd
sqwebmaild
svscan
.
Additional TCP options:
.
Wed Apr 14 17:53:52 PDT 2004
Apr 14 17:55:41 govmail sshd[385]: error: PAM: Authentication failure
More information about the freebsd-stable
mailing list