Adaptec 2120S Errors

Don Lewis truckman at FreeBSD.org
Mon Jul 14 19:09:57 PDT 2003


My first suspicion would be power supply problems.  The DC power might
be sagging when all the drives seek at the same time.


Under extreme load one of my Seagate SCSI drives will defer some
commands "forever", until CAM times out the command and resets the drive
and/or the SCSI bus.  I think this happens when the drive is saturated
with commands it can satisfy from its cache (or one area of the disk)
and it doesn't have time to go off and do the disk I/O for the other
command.  Checking with Seagate for a firmware upgrade or turning down
the number of tags were the two suggested workarounds.  I haven't
subjected the drive to this load (200+ tps) recently, so I haven't
pursued a fix.  Dunno if you can turn down the number of tagged commands
per drive if they're all hiding behind raid5.


On 14 Jul, Ray Taft wrote:
> I have been beating my head on this for a few days, any help would be
> appreciated.
> 
> HARDWARE:
> 
> Dual XEON 2.4Ghz - Hyperthreading = Off
> 3GB RAM - ECC Registered
> 3 X 15,000 RPM SEAGATE HARD DRIVES - Half Height
> Adaptec 2120S Controller - Flashed to latest version
> RAID 5 Configured - The system has completed the verify / build array
> process (100%) weeks ago.
> 
> All of the hardware is brand new (less than a month old), and we have sprung
> for more expensive shielded SCSI cables.
> 
> Here is the boot output on the controller initialization. Please note - this
> 2120s does NOT have an optional battery (which the system says it does) and
> has a 64MB cache where the system reports only 48MB.
> 
> Jun 25 01:44:27 lease042 /kernel: aac0: <Adaptec SCSI RAID 2120S> mem
> 0xf8000000-0xfbffffff irq 16 at device 1.0 on pci3
> Jun 25 01:44:27 lease042 /kernel: aac0: i960RX 100MHz, 48MB cache memory,
> optional battery present
> Jun 25 01:44:27 lease042 /kernel: aac0: Kernel 4.0-0, Build 6003, S/N b7d58a
> Jun 25 01:44:27 lease042 /kernel: aac0: Supported
> Options=1f7e<CLUSTERS,WCACHE,DATA64,HOSTTIME,RAID50,WINDOW4GB,SOFTERR,NORECO
> ND,SGMAP64,ALARM,NONDASD>
> 
> System Setup: 4.8-STABLE FreeBSD 4.8-STABLE compiled for SMP.
> 
> PERFORMANCE:
> 
> This system is consistently under relatively heavy IO and network
> utilization.
> 
> This is typical network IO for this server:
> 
> netstat 1
>             input        (Total)           output
>    packets  errs      bytes    packets  errs      bytes colls
>       4506     0     350136       7466     0    9496788     0
>       4859     0     332024       7933     0   10829943     0
>       4654     0     347940       7724     0    9614383     0
>       4638     0     321165       7701     0   10412423     0
>       4791     0     369314       7940     0    9823169     0
> 
> Typical disk IO for this server:
> 
> iostat 1
>       tty           aacd0             acd0              fd0             cpu
>  tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
>    0   22  0.00   0  0.00   0.00   0  0.00   0.00   0  0.00   6  0 13  4 77
>    0   38 46.42 107  4.85   0.00   0  0.00   0.00   0  0.00  13  0 25  5 57
>    0   38 51.18  86  4.31   0.00   0  0.00   0.00   0  0.00   9  0 19  6 66
>    0   38 57.02  81  4.52   0.00   0  0.00   0.00   0  0.00   5  0 15  7 74
>    0   38 47.86 115  5.37   0.00   0  0.00   0.00   0  0.00  10  0 22  7 61
>    0   38 48.80  88  4.20   0.00   0  0.00   0.00   0  0.00   7  0 19  1 73
> 
> ATTMEPTED REMIDYS:
> 
> We have swapped out the cable twice with different brands with different
> shielding. No effect. We do not use a converter to change pins. Strait U320
> to U320 rated hardware and cables.
> 
> Rebuilt 4.8 kernel from scratch  - twice. No effect.
> 
> Moved SCSI card to #1 PCI-X slot on motherboard.
> 
> Researched 2120s history with FreeBSD as well as U320.
> 
> Rebuilt and reverified RAID 5 array. Per a past support post, we tried
> turning on and off write cache. For the errors below, write cache is  turned
> ON.
> 
> Waited to install OS until the build / verify process completed for the RAID
> 5 array off controller.
> 
> ERRORS:
> 
> We are seeing the following messages in /var/log - dmesg on a daily basis. I
> understood this hardware had some buggy embedded code, but was supposed to
> be fixed with the flash (six months ago). Also saw posts regarding this card
> and its inability to operate under 4.7 and 4.8-RELEASE and was the reason we
> put the box up on 4.8-STABLE.
> 
> Here are the errors:
> 
> This continues for the length of the log file. It is consistant regardless
> of the cable. HEAVY REPITITION.
> 
> ul 12 14:00:53 lease042 /kernel: swap_pager: indefinite wait buffer: device:
> #aacd/0x20001, blkno: 43352, size: 8192
> Jul 12 14:00:55 lease042 /kernel: aac0: **Monitor** Drive 0:0:0 online on
> container 0:
> Jul 12 14:00:56 lease042 /kernel: aac0: **Monitor** Drive 0:2:0 online on
> container 0:
> Jul 12 14:00:57 lease042 /kernel: aac0: **Monitor** Drive 0:3:0 online on
> container 0:
> Jul 12 17:53:07 lease042 /kernel: aac0: **Monitor** ID(0:03:0) Abort
> Time-out. Resetting bus.
> Jul 12 17:53:10 lease042 /kernel: aac0: **Monitor** SCSI bus reset issued on
> channel 0
> Jul 12 17:53:12 lease042 /kernel: aac0: **Monitor** Drive 0:0:0 online on
> container 0:
> Jul 12 17:53:12 lease042 /kernel: aac0: **Monitor** Drive 0:2:0 online on
> container 0:
> Jul 12 17:53:13 lease042 /kernel: aac0: **Monitor** Drive 0:3:0 online on
> container 0:
> Jul 12 18:07:07 lease042 /kernel: aac0: **Monitor** ID(0:03:0) Abort
> Time-out. Resetting bus.
> Jul 12 18:07:10 lease042 /kernel: aac0: **Monitor** SCSI bus reset issued on
> channel 0
> Jul 12 18:07:11 lease042 /kernel: aac0: **Monitor** Drive 0:0:0 online on
> container 0:
> Jul 12 18:07:11 lease042 /kernel: aac0: **Monitor** Drive 0:2:0 online on
> container 0:
> Jul 12 18:07:12 lease042 /kernel: aac0: **Monitor** Drive 0:3:0 online on
> container 0:
> Jul 12 19:08:33 lease042 /kernel: pid 37161 (httpd), uid 65534: exited on
> signal 11
> Jul 12 19:56:46 lease042 /kernel: aac0: **Monitor** ID(0:03:0) Abort
> Time-out. Resetting bus.
> Jul 12 19:56:49 lease042 /kernel: aac0: **Monitor** SCSI bus reset issued on
> channel 0
> Jul 12 19:56:52 lease042 /kernel: aac0: **Monitor** Drive 0:0:0 online on
> container 0:
> Jul 12 19:56:52 lease042 /kernel: aac0: **Monitor** Drive 0:2:0 online on
> container 0:
> Jul 12 19:56:53 lease042 /kernel: aac0: **Monitor** Drive 0:3:0 online on
> container 0:
> Jul 12 22:56:00 lease042 /kernel: pid 41100 (httpd), uid 65534: exited on
> signal 10
> Jul 12 22:56:18 lease042 /kernel: pid 36891 (httpd), uid 65534: exited on
> signal 10
> Jul 13 02:00:12 lease042 /kernel: aac0: **Monitor** ID(0:03:0) Abort
> Time-out. Resetting bus.
> Jul 13 02:00:16 lease042 /kernel: aac0: **Monitor** SCSI bus reset issued on
> channel 0
> Jul 13 02:00:17 lease042 /kernel: aac0: **Monitor** Drive 0:0:0 online on
> container 0:
> Jul 13 02:00:17 lease042 /kernel: aac0: **Monitor** Drive 0:2:0 online on
> container 0:
> Jul 13 02:00:18 lease042 /kernel: aac0: **Monitor** Drive 0:3:0 online on
> container 0:
> Jul 13 02:11:30 lease042 /kernel: aac0: **Monitor** ID(0:03:0) Abort
> Time-out. Resetting bus.
> Jul 13 02:11:33 lease042 /kernel: aac0: **Monitor** SCSI bus reset issued on
> channel 0
> Jul 13 02:11:35 lease042 /kernel: aac0: **Monitor** Drive 0:0:0 online on
> container 0:
> Jul 13 02:11:36 lease042 /kernel: aac0: **Monitor** Drive 0:2:0 online on
> container 0:
> Jul 13 02:11:37 lease042 /kernel: aac0: **Monitor** Drive 0:3:0 online on
> container 0:
> Jul 13 03:33:04 lease042 /kernel: aac0: **Monitor** ID(0:03:0) Abort
> Time-out. Resetting bus.
> Jul 13 03:33:07 lease042 /kernel: aac0: **Monitor** SCSI bus reset issued on
> channel 0
> Jul 13 03:33:11 lease042 /kernel: aac0: **Monitor** Drive 0:0:0 online on
> container 0:
> Jul 13 03:33:11 lease042 /kernel: aac0: **Monitor** Drive 0:2:0 online on
> container 0:
> Jul 13 03:33:12 lease042 /kernel: aac0: **Monitor** Drive 0:3:0 online on
> container 0:
> 
> 
> Jul 13 12:54:17 lease042 /kernel: aac0: **Monitor** ID(0:03:0) Abort
> Time-out. Resetting bus.
> Jul 13 12:54:17 lease042 /kernel: aac0: **Monitor** SCSI bus reset issued on
> channel 0
> Jul 13 12:54:17 lease042 /kernel: aac0: COMMAND 0xcb2aff8c TIMEOUT AFTER 44
> SECONDS
> Jul 13 12:54:17 lease042 /kernel: aac0: COMMAND 0xcb2b0504 TIMEOUT AFTER 44
> SECONDS
> Jul 13 12:54:17 lease042 /kernel: aac0: COMMAND 0xcb2af6cc TIMEOUT AFTER 44
> SECONDS
> Jul 13 12:54:18 lease042 /kernel: aac0: COMMAND 0xcb2af694 TIMEOUT AFTER 44
> SECONDS
> Jul 13 12:54:18 lease042 /kernel: aac0: COMMAND 0xcb2b029c TIMEOUT AFTER 44
> SECONDS
> Jul 13 12:54:18 lease042 /kernel: aac0: COMMAND 0xcb2b006c TIMEOUT AFTER 44
> SECONDS
> Jul 13 21:34:37 lease042 /kernel: aac0: COMMAND 0xcb2b02d4 TIMEOUT AFTER 31
> SECONDS
> Jul 13 21:34:37 lease042 /kernel: aac0: COMMAND 0xcb2b0884 TIMEOUT AFTER 31
> SECONDS
> Jul 13 21:34:37 lease042 /kernel: aac0: COMMAND 0xcb2b0b94 TIMEOUT AFTER 31
> SECONDS
> Jul 13 21:34:37 lease042 /kernel: aac0: COMMAND 0xcb2aff54 TIMEOUT AFTER 31
> SECONDS
> Jul 13 21:34:37 lease042 /kernel: aac0: COMMAND 0xcb2af694 TIMEOUT AFTER 31
> SECONDS
> Jul 13 21:34:37 lease042 /kernel: aac0: COMMAND 0xcb2af18c TIMEOUT AFTER 31
> SECONDS
> Jul 13 21:34:37 lease042 /kernel: aac0: COMMAND 0xcb2b05ac TIMEOUT AFTER 31
> SECONDS
> Jul 13 21:34:37 lease042 /kernel: aac0: COMMAND 0xcb2af50c TIMEOUT AFTER 31
> SECONDS
> 
> 
> 
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"



More information about the freebsd-scsi mailing list