6.1 Freezes - Suspect SCSI Issue

Juergen Heberling pjah at hicom.net
Wed Jan 10 04:53:52 UTC 2007


Hi all

Please suggest some way of diagnosing this problem:

System freezes after being up in production and apparently stable for 
several weeks, no dump, no error message, nothing on the console - so I 
suspect hardware.

See dmesg below but it's a Supermicro (X6DA3-G2) with 2xXeon (Nocona) 
processors, onboard AIC9410 ("SAS") - each channel of the SCSI "card" 
handles 4 drives.  The drives are mirrored using GEOM on the other 
channel (NOT using the hardware mirroring).

Right after placing system into production I needed to increase the SCSI 
tags (via camcontrol) on the devices in one of the mirrors ("homea") 
because of utterly poor performance which had resulted in several 
reboots:  gstat showed queue lengths generally about 150 deep with 
spikes to 400 deep.  After setting the tags to "32" performance on the 
mirror was adequate.
I then tried to increase the tags to "64" but "camcontrol tags da4 -v" 
never showed more that "54"

1. So I dont understand why "camcontrol tags da4 -N 64" never goes above 
"54" (and why shouldn't I try to set the tags to even 128 (512 tags per 
channel, I believe, 4 drives per channel)).  The following shows the 
initial tags setting and the "reduction" (to "50" in this case). The 
commands were issued all within a few minutes.

# camcontrol tags da4 -v -N 64
(pass2:ahd0:0:4:0): tagged openings now 64
(pass2:ahd0:0:4:0): dev_openings  64
(pass2:ahd0:0:4:0): dev_active    0
(pass2:ahd0:0:4:0): devq_openings 64
(pass2:ahd0:0:4:0): devq_queued   0
(pass2:ahd0:0:4:0): held          0
(pass2:ahd0:0:4:0): mintags       2
(pass2:ahd0:0:4:0): maxtags       255
# camcontrol tags da4 -v
(pass2:ahd0:0:4:0): dev_openings  64
(pass2:ahd0:0:4:0): dev_active    0
(pass2:ahd0:0:4:0): devq_openings 64
(pass2:ahd0:0:4:0): devq_queued   0
(pass2:ahd0:0:4:0): held          0
(pass2:ahd0:0:4:0): mintags       2
(pass2:ahd0:0:4:0): maxtags       255
# camcontrol tags da4 -v
(pass2:ahd0:0:4:0): dev_openings  50
(pass2:ahd0:0:4:0): dev_active    0
(pass2:ahd0:0:4:0): devq_openings 50
(pass2:ahd0:0:4:0): devq_queued   0
(pass2:ahd0:0:4:0): held          0
(pass2:ahd0:0:4:0): mintags       2
(pass2:ahd0:0:4:0): maxtags       255


2. Why dont I see any bus or device error messages (or indication of a 
dump) in the log and what can I do to turn the error messages on?


Any suggestions would be appreciated.

Juergen

Here is my dmesg, long lines were wrapped:
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
FreeBSD 6.1-RELEASE #0: Sun May  7 04:42:56 UTC 2006
     root at opus.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3200.13-MHz 686-class CPU)
   Origin = "GenuineIntel"  Id = 0xf4a  Stepping = 10
 
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,
 
PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
   Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
   AMD Features=0x20100000<NX,LM>
   AMD Features2=0x1<LAHF>
   Logical CPUs per core: 2
real memory  = 3489071104 (3327 MB)
avail memory = 3414409216 (3256 MB)
ACPI APIC Table: <PTLTD          APIC  >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
  cpu2 (AP): APIC ID:  6
  cpu3 (AP): APIC ID:  7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
ioapic2 <Version 2.0> irqs 48-71 on motherboard
kbd1 at kbdmux0
acpi0: <PTLTD   RSDT> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <unknown> at device 0.1 (no driver attached)
pci0: <base peripheral> at device 1.0 (no driver attached)
pci0: <unknown> at device 0.1 (no driver attached)
pci0: <base peripheral> at device 1.0 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 3.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib3
ahd0: <Adaptec AIC7902 Ultra320 SCSI adapter> port 
0x2400-0x24ff,0x2000-0x20ff
         mem 0xdd200000-0xdd201fff irq 32 at device 2.0 on pci3
ahd0: [GIANT-LOCKED]
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
ahd1: <Adaptec AIC7902 Ultra320 SCSI adapter> port 
0x2c00-0x2cff,0x2800-0x28ff
         mem 0xdd202000-0xdd203fff irq 33 at device 2.1 on pci3
ahd1: [GIANT-LOCKED]
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
pci2: <base peripheral, interrupt controller> at device 0.1 (no driver 
attached)
pcib4: <ACPI PCI-PCI bridge> at device 0.2 on pci2
pci4: <ACPI PCI bus> on pcib4
em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port 
0x3000-0x303f mem 0xdd300000-0xdd31ffff
         irq 54 at device 2.0 on pci4
em0: Ethernet address: 00:30:48:68:84:32
em1: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port 
0x3040-0x307f mem 0xdd320000-0xdd33ffff
         irq 55 at device 2.1 on pci4
em1: Ethernet address: 00:30:48:68:84:33
pci2: <base peripheral, interrupt controller> at device 0.3 (no driver 
attached)
pcib5: <ACPI PCI-PCI bridge> irq 16 at device 4.0 on pci0
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> irq 16 at device 6.0 on pci0
pci6: <ACPI PCI bus> on pcib6
uhci0: <Intel 82801EB (ICH5) USB controller USB-A> port 0x1400-0x141f 
irq 16 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <Intel 82801EB (ICH5) USB controller USB-A> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0x1420-0x143f 
irq 19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0x1420-0x143f 
irq 19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <Intel 82801EB (ICH5) USB controller USB-B> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0x1440-0x145f 
irq 18 at device 29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <Intel 82801EB (ICH5) USB controller USB-C> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3: <Intel 82801EB (ICH5) USB controller USB-D> port 0x1460-0x147f 
irq 16 at device 29.3 on pci0
uhci3: [GIANT-LOCKED]
usb3: <Intel 82801EB (ICH5) USB controller USB-D> on uhci3
usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
ehci0: <Intel 82801EB/R (ICH5) USB 2.0 controller> mem 
0xdd001000-0xdd0013ff irq 23 at device 29.7 on pci0
ehci0: [GIANT-LOCKED]
usb4: EHCI version 1.0
usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
usb4: <Intel 82801EB/R (ICH5) USB 2.0 controller> on ehci0
usb4: USB revision 2.0
uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
pcib7: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci7: <ACPI PCI bus> on pcib7
pci7: <display, VGA> at device 1.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH5 UDMA100 controller> port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x14a0-0x14af at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_button0: <Power Button> on acpi0
ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77f irq 7 drq 
1 on acpi0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77f irq 7 drq 
1 on acpi0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on 
acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 
0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xd2fff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
acd0: DVDROM <SAMSUNG DVD-ROM SD-616E/F501> at ata0-master UDMA33
Waiting 5 seconds for SCSI devices to settle
da1 at ahd1 bus 0 target 0 lun 0
da1: <SEAGATE ST373207LW 0003> Fixed Direct Access SCSI-3 device
da1: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged 
Queueing Enabled
da1: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da3 at ahd1 bus 0 target 2 lun 0
da3: <SEAGATE ST373207LW 0003> Fixed Direct Access SCSI-3 device
da3: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged 
Queueing Enabled
da3: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da5 at ahd1 bus 0 target 4 lun 0
da5: <SEAGATE ST373207LW 0003> Fixed Direct Access SCSI-3 device
da5: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged 
Queueing Enabled
da5: <SEAGATE ST373207LW 0003> Fixed Direct Access SCSI-3 device
da5: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged 
Queueing Enabled
da5: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da0 at ahd0 bus 0 target 0 lun 0
da0: <SEAGATE ST373207LW 0003> Fixed Direct Access SCSI-3 device
da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged 
Queueing Enabled
da0: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da2 at ahd0 bus 0 target 2 lun 0
da2: <SEAGATE ST373207LW 0005> Fixed Direct Access SCSI-3 device
da2: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged 
Queueing Enabled
da2: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da4 at ahd0 bus 0 target 4 lun 0
da4: <SEAGATE ST373207LW 0005> Fixed Direct Access SCSI-3 device
da4: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged 
Queueing Enabled
da4: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
da6 at ahd0 bus 0 target 6 lun 0
da6: <FUJITSU MAP3367NP 0108> Fixed Direct Access SCSI-3 device
da6: 320.000MB/s transfers (160.000MHz, offset 127, 16bit), Tagged 
Queueing Enabled
da6: 35046MB (71775284 512 byte sectors: 255H 63S/T 4467C)
da7 at ahd1 bus 0 target 6 lun 0
da7: <FUJITSU MAP3367NP 0108> Fixed Direct Access SCSI-3 device
da7: 320.000MB/s transfers (160.000MHz, offset 127, 16bit), Tagged 
Queueing Enabled
da7: 35046MB (71775284 512 byte sectors: 255H 63S/T 4467C)
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
GEOM_MIRROR: Device gm0 created (id=1170997708).
GEOM_MIRROR: Device gm0: provider da0 detected.
GEOM_MIRROR: Device mail created (id=4084922715).
GEOM_MIRROR: Device mail: provider da2 detected.
GEOM_MIRROR: Device homea created (id=3543800137).
GEOM_MIRROR: Device homea: provider da4 detected.
GEOM_MIRROR: Device homeb created (id=2383534711).
GEOM_MIRROR: Device homeb: provider da6 detected.
GEOM_MIRROR: Device gm0: provider da1 detected.
GEOM_MIRROR: Device gm0: provider da1 activated.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
GEOM_MIRROR: Device gm0: rebuilding provider da0.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
GEOM_MIRROR: Device gm0: rebuilding provider da0.
GEOM_MIRROR: Device mail: provider da3 detected.
GEOM_MIRROR: Device mail: provider da3 activated.
GEOM_MIRROR: Device mail: provider mirror/mail launched.
GEOM_MIRROR: Device mail: rebuilding provider da2.
GEOM_MIRROR: Device homea: provider da5 detected.
GEOM_MIRROR: Device homea: provider da5 activated.
GEOM_MIRROR: Device homea: provider mirror/homea launched.
GEOM_MIRROR: Device homea: rebuilding provider da4.
GEOM_MIRROR: Device homeb: provider da7 detected.
GEOM_MIRROR: Device homeb: provider da7 activated.
GEOM_MIRROR: Device homeb: provider mirror/homeb launched.
GEOM_MIRROR: Device homeb: rebuilding provider da6.
Trying to mount root from ufs:/dev/mirror/gm0s1a
...


More information about the freebsd-questions mailing list