FreeBSD 4.8, ASR2120, SMP, degraded RAID1/mirror => storage failure

rysanek at fccps.cz rysanek at fccps.cz
Fri Sep 5 00:20:44 PDT 2003


Dear Mr. Long,

firstly, let me thank you for maintaining the Adaptec RAID drivers.

I've got a problem with the Adaptec 2120S in FreeBSD 4.8-RELEASE
and I haven't found any notes about that in the mailing lists.

In SMP mode, upon a RAID array degradation event (a disk is
ripped out), the system locks up almost entirely, stuck at
disk operations.
The same happens upon boot with a rebuilding/degraded array
- building from scratch or rebuilding after a disk failure,
or even just running off a single disk while the other one
is dead (no rebuild going on in the background).

The problem doesn't occur in UP mode (when options SMP and
APIC_IO are off) - that way the host system works happily
just as if there was nothing wrong with the array (except
for a few **Monitor** warnings and the LEDs going disco).
The problem also doesn't occur as long as the RAID is
"optimal".
The problem was only observed and tested in a configuration
with two disks in a mirror (one or two logical "containers"
on them), no hot spare.


My system configuration is:

2x Intel P4 Xeon @ 2.4 GHz, 533MHz FSB
1 GB RAM (dual-channel, 2x 512 MB DIMM DDR266, ECC, REG)
ServerWorks GC-LE chipset, PCI-X 64bit at 133MHz
2x3 SCA backplane with two GEM318 SAF-TE processors
On one channel, there are two pieces of Seagate ST336607LC (36 GB)
(+ 2x onboard BCM570x GbETH, 2x onboard AIC7902,
  onboard ATI RageXL PCI 8MB, etc)

The array on the AAC controller is the only disk drive in the
system -> the machine is booting from it.

To the best of my knowledge, the mechanical and electrical
parts of the U320 system are fine - they've been working for
me in Linux and with other SCSI controllers just fine, after
all the dual-channel onboard U320 HBA works just fine, too.


Attached is a tarball with debugging logs.

There are three directories, containing three different
combinations of debug options (see below items A to C).
Each directory contains six log files: a boot from a clean
array, a disk failure (somewhat improperly simulated by
ripping the SCA enclosure out), and a boot from a degraded
volume - all of that for a UP and SMP kernel. 3*2=6.

I've tried the following different debugging options and levels:
A) full CAM debug and AAC_DEBUG=2
B) AAC_DEBUG=2
C) AAC_DEBUG=4  (after I found in the sources that L4 exists)

With A), everything worked as described above.
         Just the CAM debugging messages probably cluttered the
         kernel ring buffer to the extent that some of the AAC_DEBUG
         and generic messages are missing in the log, such as those
         announcing the detection of /dev/aacd0 and /dev/aacd1
         (the two RAID volumes/containers)
With B), upon runtime disk failure, the fault occured even in UP configuration!
         -- while UP kernels without debugging continued to operate,
         and even the UP kernel as per B) continued to run fine
         after reboot, on the failed array.
With C), __SMP__: the machine behaved as expected (dead) upon
         runtime disk failure, but consistently managed to boot with
         the degraded array while it was not rebuilding (=anomaly)
         - then it crashed when I logged in and told it to `reboot`.
         When I plugged the disk drive back and the array started
         rebuilding, the SMP kernel consistently failed to boot.
         __UP__: the machine was consistently failing miserably
         upon array degradation (=anomaly). It did boot fine
         consistently with a degraded array (not rebuilding).
         It failed at boot consistently with a rebuilding array.

So it seems that the serial logging / debugging stuff modifies
timing, and hence the behavior with debugging on is different.
Reminds me of the Heisenbergian uncertainty.

Still, without debugging, the consistent pattern is:
UP = boots fine from a clean array, survives array degradation
     and boots from a degraded array.
SMP = boots fine from a clean array, does not survive array degradation
      and fails to boot from a degraded array.


While I was trying to find a typical healthy "SCSI request/response"
pattern in the logs, it seemed to me that quite often some of the
debugging messages were missing, and some were clearly cut in
half or so - perhaps I should check my RS232 cabling? Though
I really think that my cabling is all right...

>From the debug listings it would seem that the AAC driver
on the host PC gets a zero-padded FIB from the controller,
and then an endless row of interrupts.
This happens immediately after a disk failure or after driver
initialization upon boot.

The following is a piece of pseudo-code for your reference,
based on /usr/src/sys/dev/aac.c. The aac_host_command() forms
the body of a kthread that gets started upon adapter
initialization. Note the line with "!!!":

aac_host_command()
{
   while(true)
   {
      tsleep();

      for (;;)
      {
         // check for enqueued FIBs
         aac_dequeue_fib(AAC_HOST_NORM_CMD_QUEUE);

         if (found one)
         {
            // process it
         }
         else
         {
            break; // go to sleep again
         }
      }
   }
}

aac_dequeue_fib()
{
   if (ci != pi)  // consumer/producer indices
   {
      // there are some FIBs in the queue
      // !!! at the same time, the FIB is zero-padded !!!
   }
   else return(ENOENT);
}

Another symptom is that, upon array degradation, the controller
seems to reset the RAID-private SCSI bus (I hope that's what
the **Monitor** message says).

The trouble is that both the aac_host_command() wakeup with the
zero-padded FIB and the monitor messages appear in asynchronous
context (in a separate kthread or in an interrupt) and I'm not
as skilled as to say which previous action of the driver is
the immediate cause.

More on the behavior of the disk LEDs:
These LEDs on my server case are controlled by the SCA/SAF-TE
chip (GEM318).
- When the array is degraded but operating normally, the dead
disk's LED is dark and the live disk's led flashes green,
indicating normal storage transfers.
- When a degraded array is rebuilding, the two disk LEDs dance
in shades of green to orange (both the green and red
pads flashing).
- When the whole controller or the RAID-private SCSI channel
is being reset, both the two LEDs shine a steady red.
- When the machine fails at boot with a rebuilding array,
the LEDs often turn red for a few seconds (reset?) and then
one of them remains red and the other one starts dancing
green/orange... and the reset may come back a few times
before the machine locks up entirely or the BSD manages
to do an auto-reboot. Or the LED's just stay red and the
machine hangs.
- When the machine boots and runs fine (i.e., with a UP kernel
under normal conditions), the disk LED's never go red, except
for a cold reset of the whole PC. When the array is rebuilding,
the LED's keep dancing merrily between green and orange
throughout the boot process.

I guess this would indicate that it's not just the BSD driver
getting messed up - the controller probably also gets
seriously confused. Is that a chicken-vs.-egg style puzzle?


As a side note: it seems interesting to me that, regardless
of whethere debugging and SMP is on or off in any particular
combination, the kernel always rushes through to
"Waiting 15 seconds for the SCSI devices to settle"
and _immediately_ reports the RAID containers.
Only then it waits those fifteen seconds before proceeding
to detect the regular SCSI devices.


Attached is my kernel config file and a listing of
`lspci -lv`

I can't think of anything else to tell you at the moment.
Ask me if you need further help - perhaps I can modify the
debugging flags and try again, add some more instrumentation
hooks here and there to focus on particular points in the
code etc.

Any ideas are welcome.
Sorry about wasting your time by sending such an eloquent
explanation.
Thanks for the great job that you're doing.


Frank Rysanek
-------------- next part --------------
chip0 at pci0:0:0:	class=0x060000 card=0x00000000 chip=0x00141166 rev=0x31 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'CNB20-HE Host Bridge'
    class    = bridge
    subclass = HOST-PCI
chip1 at pci0:0:1:	class=0x060000 card=0x00000000 chip=0x00141166 rev=0x00 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'CNB20-HE Host Bridge'
    class    = bridge
    subclass = HOST-PCI
chip2 at pci0:0:2:	class=0x060000 card=0x00000000 chip=0x00151166 rev=0x00 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'CMIC-GC Hostbridge and MCH'
    class    = bridge
    subclass = HOST-PCI
none0 at pci0:2:0:	class=0x030000 card=0x80041002 chip=0x47521002 rev=0x27 hdr=0x00
    vendor   = 'ATI Technologies'
    device   = 'Rage XL PCI'
    class    = display
    subclass = VGA
isab0 at pci0:15:0:	class=0x060100 card=0x02011166 chip=0x02011166 rev=0x93 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'CSB5 PCI to ISA Bridge'
    class    = bridge
    subclass = PCI-ISA
atapci0 at pci0:15:1:	class=0x01018a card=0x02121166 chip=0x02121166 rev=0x93 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'CSB5 PCI EIDE Controller'
    class    = mass storage
    subclass = ATA
ohci0 at pci0:15:2:	class=0x0c0310 card=0x02201166 chip=0x02201166 rev=0x05 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'OSB4 OpenHCI Compliant USB Controller'
    class    = serial bus
    subclass = USB
chip3 at pci0:15:3:	class=0x060000 card=0x02301166 chip=0x02251166 rev=0x00 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'CSB5 PCI Bridge'
    class    = bridge
    subclass = HOST-PCI
chip4 at pci0:17:0:	class=0x060000 card=0x00000000 chip=0x01011166 rev=0x03 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'CIOB-X2'
    class    = bridge
    subclass = HOST-PCI
chip5 at pci0:17:2:	class=0x060000 card=0x00000000 chip=0x01011166 rev=0x03 hdr=0x00
    vendor   = 'Reliance Computer Corp./ServerWorks'
    device   = 'CIOB-X2'
    class    = bridge
    subclass = HOST-PCI
aac0 at pci3:4:0:	class=0x010400 card=0x02869005 chip=0x02859005 rev=0x01 hdr=0x00
    vendor   = 'Adaptec'
    device   = 'AAC-RAID RAID Controller'
    class    = mass storage
    subclass = RAID
bge0 at pci4:2:0:	class=0x020000 card=0x000814e4 chip=0x164514e4 rev=0x15 hdr=0x00
    vendor   = 'Broadcom Corporation'
    device   = 'BCM5701 NetXtreme Gigabit Ethernet'
    class    = network
    subclass = ethernet
bge1 at pci4:3:0:	class=0x020000 card=0x000814e4 chip=0x164514e4 rev=0x15 hdr=0x00
    vendor   = 'Broadcom Corporation'
    device   = 'BCM5701 NetXtreme Gigabit Ethernet'
    class    = network
    subclass = ethernet
ahd0 at pci4:4:0:	class=0x010000 card=0x005e9005 chip=0x801d9005 rev=0x10 hdr=0x00
    vendor   = 'Adaptec'
    class    = mass storage
    subclass = SCSI
ahd1 at pci4:4:1:	class=0x010000 card=0x005e9005 chip=0x801d9005 rev=0x10 hdr=0x00
    vendor   = 'Adaptec'
    class    = mass storage
    subclass = SCSI
-------------- next part --------------
#
# GENERIC -- Generic kernel configuration file for FreeBSD/i386
#
# For more information on this file, please read the handbook section on
# Kernel Configuration Files:
#
#    http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines is also present in the ./LINT configuration file. If you are
# in doubt as to the purpose or necessity of a line, check first in LINT.
#
# $FreeBSD: src/sys/i386/conf/GENERIC,v 1.246.2.51.2.2 2003/03/25 23:35:15 jhb Exp $

machine		i386
#cpu		I386_CPU
#cpu		I486_CPU
#cpu		I586_CPU
cpu		I686_CPU
ident		GENERIC
maxusers	0

#makeoptions	DEBUG=-g		#Build kernel with gdb(1) debug symbols

options 	MATH_EMULATE		#Support for x87 emulation
options 	INET			#InterNETworking
#options 	INET6			#IPv6 communications protocols
options 	FFS			#Berkeley Fast Filesystem
options 	FFS_ROOT		#FFS usable as root device [keep this!]
options 	SOFTUPDATES		#Enable FFS soft updates support
options 	UFS_DIRHASH		#Improve performance on big directories
options 	MFS			#Memory Filesystem
options 	MD_ROOT			#MD is a potential root device
options 	NFS			#Network Filesystem
options 	NFS_ROOT		#NFS usable as root device, NFS required
options 	MSDOSFS			#MSDOS Filesystem
options 	CD9660			#ISO 9660 Filesystem
options 	CD9660_ROOT		#CD-ROM usable as root, CD9660 required
options 	PROCFS			#Process filesystem
options 	COMPAT_43		#Compatible with BSD 4.3 [KEEP THIS!]
options 	SCSI_DELAY=15000	#Delay (in ms) before probing SCSI
options 	UCONSOLE		#Allow users to grab the console
options 	USERCONFIG		#boot -c editor
options 	VISUAL_USERCONFIG	#visual boot -c editor
options 	KTRACE			#ktrace(1) support
options 	SYSVSHM			#SYSV-style shared memory
options 	SYSVMSG			#SYSV-style message queues
options 	SYSVSEM			#SYSV-style semaphores
options 	P1003_1B		#Posix P1003_1B real-time extensions
options 	_KPOSIX_PRIORITY_SCHEDULING
options 	ICMP_BANDLIM		#Rate limit bad replies
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
options 	AHC_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~128k to driver.
options 	AHD_REG_PRETTY_PRINT	# Print register bitfields in debug 
					# output.  Adds ~215k to driver.

# To make an SMP kernel, the next two are needed
options 	SMP			# Symmetric MultiProcessor Kernel
options 	APIC_IO			# Symmetric (APIC) I/O

# To support HyperThreading, HTT is needed in addition to SMP and APIC_IO
options 	HTT			# HyperThreading Technology

device		isa
#device		eisa
device		pci

# Floppy drives
device		fdc0	at isa? port IO_FD1 irq 6 drq 2
device		fd0	at fdc0 drive 0
device		fd1	at fdc0 drive 1
#
# If you have a Toshiba Libretto with its Y-E Data PCMCIA floppy,
# don't use the above line for fdc0 but the following one:
#device		fdc0

# ATA and ATAPI devices
device		ata0	at isa? port IO_WD1 irq 14
device		ata1	at isa? port IO_WD2 irq 15
device		ata
device		atadisk			# ATA disk drives
device		atapicd			# ATAPI CDROM drives
device		atapifd			# ATAPI floppy drives
device		atapist			# ATAPI tape drives
options 	ATA_STATIC_ID		#Static device numbering

# SCSI Controllers
#device		ahb		# EISA AHA1742 family
#device		ahc		# AHA2940 and onboard AIC7xxx devices
device		ahd		# AHA39320/29320 and onboard AIC79xx devices
#device		amd		# AMD 53C974 (Tekram DC-390(T))
#device		isp		# Qlogic family
#device		mpt		# LSI-Logic MPT/Fusion
#device		ncr		# NCR/Symbios Logic
#device		sym		# NCR/Symbios Logic (newer chipsets)
#options 	SYM_SETUP_LP_PROBE_MAP=0x40
				# Allow ncr to attach legacy NCR devices when 
				# both sym and ncr are configured

#device		adv0	at isa?
#device		adw
#device		bt0	at isa?
#device		aha0	at isa?
#device		aic0	at isa?

#device		ncv		# NCR 53C500
#device		nsp		# Workbit Ninja SCSI-3
#device		stg		# TMC 18C30/18C50

# SCSI peripherals
device		scbus		# SCSI bus (required)
device		da		# Direct Access (disks)
device		sa		# Sequential Access (tape etc)
device		cd		# CD
device		pass		# Passthrough device (direct SCSI access)

# RAID controllers interfaced to the SCSI subsystem
#device		asr		# DPT SmartRAID V, VI and Adaptec SCSI RAID
#device		dpt		# DPT Smartcache - See LINT for options!
#device		iir		# Intel Integrated RAID
#device		mly		# Mylex AcceleRAID/eXtremeRAID
#device		ciss		# Compaq SmartRAID 5* series

# RAID controllers
device		aac		# Adaptec FSA RAID, Dell PERC2/PERC3
#options 	AAC_DEBUG=4
#device		aacp		# SCSI passthrough for aac (requires CAM)
#device		ida		# Compaq Smart RAID
#device		amr		# AMI MegaRAID
#device		mlx		# Mylex DAC960 family
#device		twe		# 3ware Escalade

#options 	CAMDEBUG
#options 	CAM_DEBUG_BUS=-1
#options 	CAM_DEBUG_TARGET=-1
#options 	CAM_DEBUG_LUN=-1
#options 	CAM_DEBUG_FLAGS="CAM_DEBUG_INFO|CAM_DEBUG_TRACE|CAM_DEBUG_SUBTRACE|CAM_DEBUG_CDB|CAM_DEBUG_XPT|CAM_DEBUG_PERIPH"

# atkbdc0 controls both the keyboard and the PS/2 mouse
device		atkbdc0	at isa? port IO_KBD
device		atkbd0	at atkbdc? irq 1 flags 0x1
device		psm0	at atkbdc? irq 12

device		vga0	at isa?

# splash screen/screen saver
pseudo-device	splash

# syscons is the default console driver, resembling an SCO console
device		sc0	at isa? flags 0x100

# Enable this and PCVT_FREEBSD for pcvt vt220 compatible console driver
#device		vt0	at isa?
#options 	XSERVER			# support for X server on a vt console
#options 	FAT_CURSOR		# start with block cursor
# If you have a ThinkPAD, uncomment this along with the rest of the PCVT lines
#options 	PCVT_SCANSET=2		# IBM keyboards are non-std

device		agp		# support several AGP chipsets

# Floating point support - do not disable.
device		npx0	at nexus? port IO_NPX irq 13

# Power management support (see LINT for more options)
device		apm0	at nexus? disable flags 0x20 # Advanced Power Management

# PCCARD (PCMCIA) support
#device		card
#device		pcic0	at isa? irq 0 port 0x3e0 iomem 0xd0000
#device		pcic1	at isa? irq 0 port 0x3e2 iomem 0xd4000 disable

# Serial (COM) ports
device		sio0	at isa? port IO_COM1 flags 0x30 irq 4
device		sio1	at isa? port IO_COM2 irq 3
device		sio2	at isa? disable port IO_COM3 irq 5
device		sio3	at isa? disable port IO_COM4 irq 9

options CONSPEED=115200

# Parallel port
device		ppc0	at isa? irq 7
device		ppbus		# Parallel port bus (required)
device		lpt		# Printer
#device		plip		# TCP/IP over parallel
#device		ppi		# Parallel port interface device
#device		vpo		# Requires scbus and da


# PCI Ethernet NICs.
#device		de		# DEC/Intel DC21x4x (``Tulip'')
#device		em		# Intel PRO/1000 adapter Gigabit Ethernet Card (``Wiseman'')
#device		txp		# 3Com 3cR990 (``Typhoon'')
#device		vx		# 3Com 3c590, 3c595 (``Vortex'')

# PCI Ethernet NICs that use the common MII bus controller code.
# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
device		miibus		# MII bus support
#device		dc		# DEC/Intel 21143 and various workalikes
#device		fxp		# Intel EtherExpress PRO/100B (82557, 82558)
#device		pcn		# AMD Am79C97x PCI 10/100 NICs
#device		rl		# RealTek 8129/8139
#device		sf		# Adaptec AIC-6915 (``Starfire'')
#device		sis		# Silicon Integrated Systems SiS 900/SiS 7016
#device		ste		# Sundance ST201 (D-Link DFE-550TX)
#device		tl		# Texas Instruments ThunderLAN
#device		tx		# SMC EtherPower II (83c170 ``EPIC'')
#device		vr		# VIA Rhine, Rhine II
#device		wb		# Winbond W89C840F
#device		xl		# 3Com 3c90x (``Boomerang'', ``Cyclone'')
device		bge		# Broadcom BCM570x (``Tigon III'')

# ISA Ethernet NICs.
# 'device ed' requires 'device miibus'
#device		ed0	at isa? disable port 0x280 irq 10 iomem 0xd8000
#device		ex
#device		ep
#device		fe0	at isa? disable port 0x300
# Xircom Ethernet
#device		xe
# PRISM I IEEE 802.11b wireless NIC.
#device		awi
# WaveLAN/IEEE 802.11 wireless NICs. Note: the WaveLAN/IEEE really
# exists only as a PCMCIA device, so there is no ISA attachment needed
# and resources will always be dynamically assigned by the pccard code.
#device		wi
# Aironet 4500/4800 802.11 wireless NICs. Note: the declaration below will
# work for PCMCIA and PCI cards, as well as ISA cards set to ISA PnP
# mode (the factory default). If you set the switches on your ISA
# card for a manually chosen I/O address and IRQ, you must specify
# those parameters here.
#device		an
# The probe order of these is presently determined by i386/isa/isa_compat.c.
#device		ie0	at isa? disable port 0x300 irq 10 iomem 0xd0000
#device		le0	at isa? disable port 0x300 irq 5 iomem 0xd0000
#device		lnc0	at isa? disable port 0x280 irq 10 drq 0
#device		cs0	at isa? disable port 0x300
#device		sn0	at isa? disable port 0x300 irq 10

# Pseudo devices - the number indicates how many units to allocate.
pseudo-device	loop		# Network loopback
pseudo-device	ether		# Ethernet support
pseudo-device	sl	1	# Kernel SLIP
pseudo-device	ppp	1	# Kernel PPP
pseudo-device	tun		# Packet tunnel.
pseudo-device	pty		# Pseudo-ttys (telnet etc)
pseudo-device	md		# Memory "disks"
pseudo-device	gif		# IPv6 and IPv4 tunneling
pseudo-device	faith	1	# IPv6-to-IPv4 relaying (translation)

# The `bpf' pseudo-device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
pseudo-device	bpf		#Berkeley packet filter

# USB support
device		uhci		# UHCI PCI->USB interface
device		ohci		# OHCI PCI->USB interface
device		usb		# USB Bus (required)
device		ugen		# Generic
device		uhid		# "Human Interface Devices"
device		ukbd		# Keyboard
device		ulpt		# Printer
device		umass		# Disks/Mass storage - Requires scbus and da
device		ums		# Mouse
#device		uscanner	# Scanners
#device		urio		# Diamond Rio MP3 Player
# USB Ethernet, requires mii
#device		aue		# ADMtek USB ethernet
#device		cue		# CATC USB ethernet
#device		kue		# Kawasaki LSI USB ethernet


More information about the freebsd-questions mailing list