6.0 random freezes

Atanas atanas at asd.aplus.net
Mon Dec 12 13:08:27 PST 2005


Hi,

I have 3 machines running 6.0-RELEASE, and recently 2 of them started 
freezing once a day or so. There are no error messages on the console or 
in the system logs.

The first one I put in production about a month ago and it was working 
flawlessly until it got some load and now it started freezing almost 
every day. The second one has exactly the same behavior - it was fine 
when doing nothing (a couple of weeks), and started freezing when loaded.

The load I'm talking about is less than moderate (less that 2.0 with 
plenty of CPU idle time). The freezing thing also does not appear to 
happen at peak times (I have rrdtool based CPU load graphs).

Both machines have (almost) identical motherboards:

Intel SE7520JR2SCSID2 and SE7520JR2ATAD2
2 Intel XeonE 3.2GHz 800MHz CPUs
4GB DDRII400 RegECC RAM

The first one has 8 72GB <SEAGATE ST373207LC 0003> Ultra320 SCSI drives 
attached as plain drives (no raid) to the on-board <LSILogic 1030 Ultra4 
Adapter>.

The second one has 8 500GB <SEAGATE ST3500641AS> SATA2 drives attached 
to a <3ware Model 9550SX-8LP> controller and configured as a RAID5 array.

The motherboards have 2 1000Mbps NICs on board, but due to some (em) 
driver problems, I usually disable these from BIOS and use a PCI Intel 
100Mbps (fxp) instead.

Both machines were running 6.0-RELEASE, i386. For the last one I had to 
updated the twa driver manually, as the one shipped with 6.0 didn't 
support 3ware 9550SX. I see that new version recently got committed into 
the -STABLE branches.

Here are the diffs against the GENERIC kernel configuration:

< cpu           I486_CPU
< cpu           I586_CPU

< makeoptions   DEBUG=-g                # Build kernel with gdb(1) debug 
symbols

< options       INET6                   # IPv6 communications protocols
53d47
< options       SCSI_DELAY=5000         # Delay (in ms) before probing SCSI

 > options               QUOTA
 > options               SMP             # Symmetric MultiProcessor Kernel

/boot/loader.conf:

kern.ipc.nmbclusters="65536"

/etc/stysctl.conf:

kern.ipc.somaxconn=1024
net.inet.tcp.recvspace=16384
net.inet.ip.fw.verbose=1
machdep.hyperthreading_allowed=1

Both machines boot with ACPI and hyperthreading enabled.

First I suspected the hardware, so I replaced the entire box (keeping 
the same drives) - no changes - it got frozen again in less than 24 hours.

Then I disabled ACPI (hint.acpi.0.disabled="1") and the hyperthreading - 
no change - the same thing.

Then after reading all related (I believe) postings here and in 
freebsd-current, I decided to upgrade both boxes to 6.0-STABLE (I saw a 
lot of changes in the source tree), but the thing continued to happen.

I have another machine with the same hardware components (the SCSI based 
one), but running 5.4-RELEASE. Unlike these two, it's really loaded 
(even got DDoS-ed a while ago) and I had zero problems with it for months.

I remember having similar issues when performing 4GB RAM upgrades on a 
bunch of 4.x based boxes, when I had to set KVA_PAGES to something like 
512. For 5.3+ however this is no longer seems to be an issue.

I would provide more useful feedback if I had some real and relevant 
error messages. Actually I got some unusual errors on only one of the 
affected servers:

Dec 11 02:48:36 xyz kernel: calcru: runtime went backwards from 28636364 
usec to 28636021 usec for pid 28588
  (httpd)

But it does not seem to be much relevant to the problem as it did not 
happened to be any close to the freezes (i.e. it was 26 hours after the 
last crash and 19 hours before the next one).

Now the only reasonable option for me (I mean for production and in 
relatively short term) seems going downward to 5.4 and wait until 6.x 
get more stable

Two dmesg.boot files attached.

Any comments, suggestions and questions are welcome.

Regards,
Atanas
-------------- next part --------------
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 6.0-STABLE #0: Fri Dec  9 14:54:05 PST 2005
    root at xyz:/var/obj/usr/src/sys/XYZ
ACPI APIC Table: <A M I  OEMAPIC >
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3192.01-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
  AMD Features=0x20100000<NX,LM>
  Hyperthreading: 2 logical CPUs
real memory  = 3757965312 (3583 MB)
avail memory = 3678597120 (3508 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
ioapic2 <Version 2.0> irqs 48-71 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <A M I OEMRSDT> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_throttle0: <ACPI CPU Throttling> on cpu0
cpu1: <ACPI CPU> on acpi0
acpi_throttle1: <ACPI CPU Throttling> on cpu1
acpi_throttle1: failed to attach P_CNT
device_attach: acpi_throttle1 attach returned 6
cpu2: <ACPI CPU> on acpi0
acpi_throttle2: <ACPI CPU Throttling> on cpu2
acpi_throttle2: failed to attach P_CNT
device_attach: acpi_throttle2 attach returned 6
cpu3: <ACPI CPU> on acpi0
acpi_throttle3: <ACPI CPU Throttling> on cpu3
acpi_throttle3: failed to attach P_CNT
device_attach: acpi_throttle3 attach returned 6
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <unknown> at device 0.1 (no driver attached)
pci0: <base peripheral> at device 1.0 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1
pci2: <ACPI PCI bus> on pcib2
fxp0: <Intel 82550 Pro/100 Ethernet> port 0xdc00-0xdc3f mem 0xfcffe000-0xfcffefff,0xfcfa0000-0xfcfbffff irq 28 at device 2.0 on pci2
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:0e:0c:9c:47:a8
3ware device driver for 9000 series storage controllers, version: 3.60.02.012
twa0: <3ware 9000 series Storage Controller> port 0xdc80-0xdcbf mem 0xfa000000-0xfbffffff,0xfcfff000-0xfcffffff irq 27 at device 3.0 on pci2
twa0: [FAST]
twa0: WARNING: (0x04: 0x0008): Unclean shutdown detected: unit=0
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-8LP, 8 ports, Firmware FE9X 3.02.00.004, BIOS BE9X 3.01.00.024
pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci1
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pci4: <display, VGA> at device 12.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH5 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse, device ID 3
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FAST]
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xca7ff,0xca800-0xcbfff,0xcc000-0xcd7ff,0xd5000-0xdb7ff on isa0
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
acd0: CDROM <SR244W/T01A> at ata0-master UDMA33
twa0: ERROR: (0x03: 0x01d0): Invalid field in parameter list: 
da0 at twa0 bus 0 target 0 lun 0
da0: <AMCC 9550SX-8LP DISK 3.02> Fixed Direct Access SCSI-3 device 
da0: 100.000MB/s transfers
da0: 2860962MB (5859250176 512 byte sectors: 255H 63S/T 364721C)
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
Trying to mount root from ufs:/dev/da0s1a
WARNING: / was not properly dismounted
WARNING: /home/u1 was not properly dismounted
/home/u1: mount pending error: blocks 888 files 2
WARNING: /home/u2 was not properly dismounted
WARNING: /var was not properly dismounted
/var: mount pending error: blocks 292 files 1
ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging disabled
Accounting enabled
-------------- next part --------------
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 6.0-STABLE #0: Fri Dec  9 11:52:26 PST 2005
    root at xyz:/var/obj/usr/src/sys/XYZ
ACPI APIC Table: <A M I  OEMAPIC >
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3192.01-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
  AMD Features=0x20100000<NX,LM>
  Hyperthreading: 2 logical CPUs
real memory  = 3757965312 (3583 MB)
avail memory = 3678597120 (3508 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
ioapic2 <Version 2.0> irqs 48-71 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <A M I OEMRSDT> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_throttle0: <ACPI CPU Throttling> on cpu0
cpu1: <ACPI CPU> on acpi0
acpi_throttle1: <ACPI CPU Throttling> on cpu1
acpi_throttle1: failed to attach P_CNT
device_attach: acpi_throttle1 attach returned 6
cpu2: <ACPI CPU> on acpi0
acpi_throttle2: <ACPI CPU Throttling> on cpu2
acpi_throttle2: failed to attach P_CNT
device_attach: acpi_throttle2 attach returned 6
cpu3: <ACPI CPU> on acpi0
acpi_throttle3: <ACPI CPU Throttling> on cpu3
acpi_throttle3: failed to attach P_CNT
device_attach: acpi_throttle3 attach returned 6
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <unknown> at device 0.1 (no driver attached)
pci0: <base peripheral> at device 1.0 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1
pci2: <ACPI PCI bus> on pcib2
fxp0: <Intel 82550 Pro/100 Ethernet> port 0xd480-0xd4bf mem 0xfcfd7000-0xfcfd7fff,0xfcf80000-0xfcf9ffff irq 27 at device 3.0 on pci2
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:0e:0c:9c:4c:87
mpt0: <LSILogic 1030 Ultra4 Adapter> port 0xd800-0xd8ff mem 0xfcfc0000-0xfcfcffff,0xfcfb0000-0xfcfbffff irq 26 at device 5.0 on pci2
mpt0: [GIANT-LOCKED]
mpt0: MPI Version=1.2.14.0
mpt0: Unhandled Event Notify Frame. Event 0xa.
mpt0: Capabilities: ( RAID-1E RAID-1 SAFTE )
mpt0: 0 Active Volumes (1 Max)
mpt0: 0 Hidden Drive Members (6 Max)
mpt1: <LSILogic 1030 Ultra4 Adapter> port 0xdc00-0xdcff mem 0xfcff0000-0xfcffffff,0xfcfe0000-0xfcfeffff irq 25 at device 5.1 on pci2
mpt1: [GIANT-LOCKED]
mpt1: MPI Version=1.2.14.0
mpt1: Unhandled Event Notify Frame. Event 0xa.
mpt1: Capabilities: ( RAID-1E RAID-1 SAFTE )
mpt1: 0 Active Volumes (1 Max)
mpt1: 0 Hidden Drive Members (6 Max)
pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci1
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pci4: <display, VGA> at device 12.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH5 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
atapci1: <Intel ICH5 SATA150 controller> port 0xcc80-0xcc87,0xcc00-0xcc03,0xc880-0xc887,0xc800-0xc803,0xc480-0xc48f irq 18 at device 31.2 on pci0
atapci1: failed to enable memory mapping!
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FAST]
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xca7ff,0xca800-0xce7ff on isa0
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
acd0: CDROM <SR244W/T01A> at ata0-master UDMA33
ad4: 476940MB <Seagate ST3500641AS 2.AAA> at ata2-master SATA150
Waiting 2 seconds for SCSI devices to settle
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
da1 at mpt0 bus 0 target 1 lun 0
da1: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device 
da1: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da1: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da0 at mpt0 bus 0 target 0 lun 0
da0: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device 
da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da0: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da3 at mpt0 bus 0 target 3 lun 0
da3: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device 
da3: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da3: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da2 at mpt0 bus 0 target 2 lun 0
da2: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device 
da2: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da2: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da5 at mpt0 bus 0 target 5 lun 0
da5: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device 
da5: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da5: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da4 at mpt0 bus 0 target 4 lun 0
da4: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device 
da4: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da4: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da6 at mpt0 bus 0 target 6 lun 0
da6: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device 
da6: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da6: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da7 at mpt0 bus 0 target 8 lun 0
da7: <SEAGATE ST373207LC 0003> Fixed Direct Access SCSI-3 device 
da7: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da7: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
Trying to mount root from ufs:/dev/da0s1a
WARNING: / was not properly dismounted
ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging disabled
Accounting enabled


More information about the freebsd-stable mailing list