Controller is no longer running

Scott Lambert lambert at lambertfam.org
Tue Sep 21 19:49:31 UTC 2010


I've had this problem occur about five times in the last year since
we've been on 8.x.

It happened with 7.x also, but it wasn't as critical a machine back
then and I didn't care as much and hoped 8 would make it all better.
The machine wasn't loaded as heavily and it probably happenned three
times in two years.

The problem may happen two days, two hours, or 5 months apart.  I
haven't been able to figure out a set of conditions which apply
every time it happens.  It does tend to happen while the backups
are running, amanda dump or tar.   I think that just provides the
critical disk I/O load level to make the problem more likely.

I swear I took picture of the error messages on the console the
time before this when it happened, but can't find them now.  

This morning I had remote hands power cycle it while I was en-route
to the office.  The message on-screen was or was very close to "The
controller is no longer running".  I remember messages about timing
out commands to the raid controller after something like 15 seconds
from the last time.

The firmware on the controller is from 2006 and is the latest I
found to be available.

Is this a known problem with the Adaptec 2120S type RAID cards?  Or
do I just have bad hardware?  

The array is always intact after a power cycle.  But fsck has to
fix many things.  It is now a cyrus-imapd mail server.

FreeBSD 8.1-STABLE #0: Thu Aug 19 19:41:51 CDT 2010
    root at cyrus.example.com:/usr/obj/usr/src/sys/GENERIC i386

CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.02-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf48  Family = f  Model = 4  Stepping = 8
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x649d<SSE3,DTES64,MON,DS_CPL,EST,CNXT-ID,CX16,xTPR>
  AMD Features=0x20100000<NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant

real memory  = 2147483648 (2048 MB)
Physical memory chunk(s):
0x0000000000001000 - 0x000000000009dfff, 643072 bytes (157 pages)
0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages)
0x0000000001026000 - 0x000000007db8afff, 2092322816 bytes (510821 pages)
avail memory = 2090995712 (1994 MB)

aac0: <Adaptec SCSI RAID 2120S> mem 0xf8000000-0xfbffffff irq 50 at device 9.0 on pci3
aac0: Reserved 0x4000000 bytes for rid 0x10 type 3 at 0xf8000000
aac0: Enable Raw I/O
aac0: New comm. interface enabled
ioapic2: routing intpin 2 (PCI IRQ 50) to lapic 0 vector 51
aac0: [MPSAFE]
aac0: [ITHREAD]
aac0: i960 80303 100MHz, 64MB memory (48MB cache, 16MB execution), optional battery present
aac0: Kernel 4.2-0, Build 8205, S/N 503926
aac0: Supported Options=31d7e<CLUSTERS,WCACHE,DATA64,HOSTTIME,RAID50,WINDOW4GB,SOFTERR,SGMAP64,ALARM,NONDASD,ADPTINFO,NEWCOMM>
aac0: Adaptec 2120S, aac driver 2.1.9-1
aacp0: <SCSI Passthrough Bus> on aac0

aacd0: <RAID 5> on aac0
aacd0: 279962MB (573362176 sectors)
GEOM: new disk aacd0

(probe0:aacp0:0:0:0): Data overrun
(probe0:aacp0:0:0:0): Retrying command
(probe0:aacp0:0:0:0): Data overrun
(probe0:aacp0:0:0:0): Retrying command
(probe0:aacp0:0:0:0): Data overrun
(probe0:aacp0:0:0:0): Retrying command
(probe0:aacp0:0:0:0): Data overrun
(probe0:aacp0:0:0:0): Retrying command
(probe0:aacp0:0:0:0): Data overrun
(probe0:aacp0:0:0:0): Error 5, Retries exhausted
(probe0:aacp0:0:2:0): Data overrun
(probe0:aacp0:0:2:0): Retrying command
(probe0:aacp0:0:2:0): Data overrun
(probe0:aacp0:0:2:0): Retrying command
(probe0:aacp0:0:2:0): Data overrun
(probe0:aacp0:0:2:0): Retrying command
(probe0:aacp0:0:2:0): Data overrun
(probe0:aacp0:0:2:0): Retrying command
(probe0:aacp0:0:2:0): Data overrun
(probe0:aacp0:0:2:0): Error 5, Retries exhausted
(probe0:aacp0:0:3:0): Data overrun
(probe0:aacp0:0:3:0): Retrying command
(probe0:aacp0:0:3:0): Data overrun
(probe0:aacp0:0:3:0): Retrying command
(probe0:aacp0:0:3:0): Data overrun
(probe0:aacp0:0:3:0): Retrying command
(probe0:aacp0:0:3:0): Data overrun
(probe0:aacp0:0:3:0): Retrying command
(probe0:aacp0:0:3:0): Data overrun
(probe0:aacp0:0:3:0): Error 5, Retries exhausted
(probe0:aacp0:0:4:0): Data overrun
(probe0:aacp0:0:4:0): Retrying command
(probe0:aacp0:0:4:0): Data overrun
(probe0:aacp0:0:4:0): Retrying command
(probe0:aacp0:0:4:0): Data overrun
(probe0:aacp0:0:4:0): Retrying command
(probe0:aacp0:0:4:0): Data overrun
(probe0:aacp0:0:4:0): Retrying command
(probe0:aacp0:0:4:0): Data overrun
(probe0:aacp0:0:4:0): Error 5, Retries exhausted
(probe0:aacp0:0:6:0): Data overrun
(probe0:aacp0:0:6:0): Retrying command
(probe0:aacp0:0:6:0): Data overrun
(probe0:aacp0:0:6:0): Retrying command
(probe0:aacp0:0:6:0): Data overrun
(probe0:aacp0:0:6:0): Retrying command
(probe0:aacp0:0:6:0): Data overrun
(probe0:aacp0:0:6:0): Retrying command
(probe0:aacp0:0:6:0): Data overrun
(probe0:aacp0:0:6:0): Error 5, Retries exhausted
pass0 at aacp0 bus 0 scbus0 target 0 lun 0
pass0: <SEAGATE ST3146707LC 0003> Fixed Uninstalled SCSI-3 device 
pass0: 3.300MB/s transfers
pass1 at aacp0 bus 0 scbus0 target 2 lun 0
pass1: <SEAGATE ST3146707LC 0003> Fixed Uninstalled SCSI-3 device 
pass1: 3.300MB/s transfers
pass2 at aacp0 bus 0 scbus0 target 3 lun 0
pass2: <SEAGATE ST3146707LC 0003> Fixed Uninstalled SCSI-3 device 
pass2: 3.300MB/s transfers
pass3 at aacp0 bus 0 scbus0 target 4 lun 0
pass3: <SEAGATE ST3146707LC 0003> Fixed Uninstalled SCSI-3 device 
pass3: 3.300MB/s transfers
pass4 at aacp0 bus 0 scbus0 target 6 lun 0
pass4: <ESG-SHV SCA HSBP M29 1.10> Fixed Uninstalled SCSI-2 device 
pass4: 3.300MB/s transfers
ses0 at aacp0 bus 0 scbus0 target 6 lun 0
ses0: <ESG-SHV SCA HSBP M29 1.10> Fixed Uninstalled SCSI-2 device 
ses0: 3.300MB/s transfers
ses0: SAF-TE Compliant Device
pass0 at aacp0 bus 0 scbus0 target 0 lun 0
pass0: <SEAGATE ST3146707LC 0003> Fixed Uninstalled SCSI-3 device 
pass0: 3.300MB/s transfers
pass1 at aacp0 bus 0 scbus0 target 2 lun 0
pass1: <SEAGATE ST3146707LC 0003> Fixed Uninstalled SCSI-3 device 
pass1: 3.300MB/s transfers
pass2 at aacp0 bus 0 scbus0 target 3 lun 0
pass2: <SEAGATE ST3146707LC 0003> Fixed Uninstalled SCSI-3 device 
pass2: 3.300MB/s transfers
pass3 at aacp0 bus 0 scbus0 target 4 lun 0
pass3: <SEAGATE ST3146707LC 0003> Fixed Uninstalled SCSI-3 device 
pass3: 3.300MB/s transfers
Trying to mount root from ufs:/dev/aacd0s1a
WARNING: / was not properly dismounted


-- 
Scott Lambert                    KC5MLE                       Unix SysAdmin
lambert at lambertfam.org



More information about the freebsd-scsi mailing list