changed cable, server still hangs after ~24hrs ...

Marc G. Fournier scrappy at hub.org
Sun Apr 27 06:05:46 PDT 2003


'K, after the last hang, I got the techs to replace the SCSI cable in the
box, which made no difference ...

I've removed the KVA_PAGES args from the kernel, so that there is nothing
'weird' configured into it, and now aaccli for the 5400 works (I haven't
been able to get my hands on one for the 2120s yet), and am not sure what
sort of info I should be looking at/for (or even what is particularly safe
to run) ... but does any of the above provide *anything*?

Note that this enclosure is one the Intel SR2200(s), and I'm still getting
the occasional 'Time-out', which to me indicates a problem, but according
to the controller:

AAC0> disk show smart
Executing: disk show smart

        Smart    Method of         Enable
        Capable  Informational     Exception  Performance  Error
C:ID:L  Device   Exceptions(MRIE)  Control    Enabled      Count
------  -------  ----------------  ---------  -----------  ------
0:00:0     Y            6             Y           N             0
0:01:0     Y            6             Y           N             0
0:02:0     Y            6             Y           N             0
0:03:0     Y            6             Y           N             0
0:04:0     Y            6             Y           N             0
0:05:0     Y            6             Y           N             0

I would have expected Error Count to have increased by at least 1 if there
was a problem at the hardware level ... no?

The system itself is a Dual-PIII, 4G of RAM ... Intel MOBO & Chassis, so
the only SCSI cable I'm dealing with is from the MOBO to the backplane
itself ...

The hangs are similar to the original ones, where I'd get TIMEOUT
scrolling up the screen, but since Scott's last "fix" for the 2G
allocation issue, I no longer get the actual error messages ...

On each hang, I've asked the techs to do a 'ctl-alt-esc', but, again, like
before, this doesn't work :(

Help?  Anything else I can get the techs to try to eliminate 'hardware' as
the cause? :(

neptune# grep aac /var/log/messages
Apr 27 07:42:02 neptune /kernel: aac0: **Monitor** ID(0:05:0) Abort Time-out. Resetting bus.
Apr 27 07:42:05 neptune /kernel: aac0: **Monitor** SCSI bus reset issued on channel 0
Apr 27 09:29:19 neptune /kernel: aac0: <Adaptec SCSI RAID 2120S> mem 0xf8000000-0xfbffffff irq 2 at device 9.0 on pci1
Apr 27 09:29:19 neptune /kernel: aac0: i960RX 100MHz, 48MB cache memory, optional battery present
Apr 27 09:29:19 neptune /kernel: aac0: Kernel 4.0-0, Build 5770, S/N 232fb7
Apr 27 09:29:19 neptune /kernel: aac0: Supported Options=1f7e<CLUSTERS,WCACHE,DATA64,HOSTTIME,RAID50,WINDOW4GB,SOFTERR,NORECOND,SGMAP64,ALARM,NONDASD>
Apr 27 09:29:20 neptune /kernel: aacd0: <RAID 5> on aac0
Apr 27 09:29:20 neptune /kernel: aacd0: 174993MB (358387200 sectors)
Apr 27 09:29:20 neptune /kernel: Mounting root from ufs:/dev/aacd0s1a
neptune# zgrep aac /var/log/messages.0.gz
neptune# zgrep aac /var/log/messages.1.gz
neptune# zgrep aac /var/log/messages.2.gz
Apr 24 14:56:45 neptune /kernel: aac0: **Monitor** ID(0:05:0) Abort Time-out. Resetting bus.
Apr 24 14:56:48 neptune /kernel: aac0: **Monitor** SCSI bus reset issued on channel 0
neptune# zgrep aac /var/log/messages.3.gz
Apr 23 02:20:20 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116328, size: 4096
Apr 23 02:20:29 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 104256, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 111896, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116304, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 112576, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116952, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 113144, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 87424, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116312, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 117016, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116408, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 43984, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116296, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 111224, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 112440, size: 8192
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 104840, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 111856, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 15208, size: 4096
Apr 23 02:20:31 neptune /kernel: aac0: **Monitor** ID(0:01:0) Abort Time-out. Resetting bus.
Apr 23 02:20:31 neptune /kernel: aac0: **Monitor** SCSI bus reset issued on channel 0
Apr 23 10:39:30 neptune /kernel: aac0: <Adaptec SCSI RAID 2120S> mem 0xf8000000-0xfbffffff irq 2 at device 9.0 on pci1
Apr 23 10:39:30 neptune /kernel: aac0: i960RX 100MHz, 48MB cache memory, optional battery present
Apr 23 10:39:30 neptune /kernel: aac0: Kernel 4.0-0, Build 5770, S/N 232fb7
Apr 23 10:39:30 neptune /kernel: aac0: Supported Options=1f7e<CLUSTERS,WCACHE,DATA64,HOSTTIME,RAID50,WINDOW4GB,SOFTERR,NORECOND,SGMAP64,ALARM,NONDASD>
Apr 23 10:39:30 neptune /kernel: aacd0: <RAID 5> on aac0
Apr 23 10:39:30 neptune /kernel: aacd0: 174993MB (358387200 sectors)
Apr 23 10:39:30 neptune /kernel: Mounting root from ufs:/dev/aacd0s1a
Apr 23 23:32:39 neptune /kernel: aac0: **Monitor** ID(0:01:0) Abort Time-out. Resetting bus.
Apr 23 23:32:42 neptune /kernel: aac0: **Monitor** SCSI bus reset issued on channel 0


AAC0> controller details
Executing: controller details
Controller Information
----------------------
         Remote Computer: S
             Device Name: S
         Controller Type: No Info
             Access Mode: READ-WRITE
Controller Serial Number: Last Six Digits = 232FB7
         Number of Buses: 1
         Devices per Bus: 15
          Controller CPU: i960 R series
    Controller CPU Speed: 100 Mhz
       Controller Memory: 64 Mbytes
           Battery State: Not Present

Component Revisions
-------------------
                CLI: 1.0-0 (Build #5263)
                API: 1.0-0 (Build #5263)
    Miniport Driver: 4.0-0 (Build #5770)
Controller Software: 4.0-0 (Build #5770)
    Controller BIOS: 4.0-0 (Build #5770)
Controller Firmware: (Build #5770)
Controller Hardware: 2.64

Scsi   Partition     Container  MultiLevel
C:ID:L Offset:Size   Num Type   Num Type   R/W
------ ------------- --- ------ --- ------ ---
0:00:0 64.0KB:34.1GB  0  RAID-5  0  None   RW
0:01:0 64.0KB:34.1GB  0  RAID-5  0  None   RW
0:02:0 64.0KB:34.1GB  0  RAID-5  0  None   RW
0:03:0 64.0KB:34.1GB  0  RAID-5  0  None   RW
0:04:0 64.0KB:34.1GB  0  RAID-5  0  None   RW
0:05:0 64.0KB:34.1GB  0  RAID-5  0  None   RW

        Smart    Method of         Enable
        Capable  Informational     Exception  Performance  Error
C:ID:L  Device   Exceptions(MRIE)  Control    Enabled      Count
------  -------  ----------------  ---------  -----------  ------
0:00:0     Y            6             Y           N             0
0:01:0     Y            6             Y           N             0
0:02:0     Y            6             Y           N             0
0:03:0     Y            6             Y           N             0
0:04:0     Y            6             Y           N             0
0:05:0     Y            6             Y           N             0
0:06:0     N
0:06:1     N
0:06:2     N
0:06:3     N
0:06:4     N
0:06:5     N
0:06:6     N
0:06:7     N

C:ID:L  Device Type     Blocks    Bytes/Block Usage            Shared Rate
------  --------------  --------- ----------- ---------------- ------ ----
0:00:0   Disk            71687372  512         Initialized      NO     320
0:01:0   Disk            71687372  512         Initialized      NO     320
0:02:0   Disk            71687372  512         Initialized      NO     320
0:03:0   Disk            71687372  512         Initialized      NO     320
0:04:0   Disk            71687372  512         Initialized      NO     320
0:05:0   Disk            71687372  512         Initialized      NO     320


Num          Total  Oth Stripe          Scsi   Partition
Label Type   Size   Ctr Size   Usage   C:ID:L Offset:Size
----- ------ ------ --- ------ ------- ------ -------------
 0    RAID-5  170GB       64KB Open    0:00:0 64.0KB:34.1GB
 /dev/aacd0           FreeBSD          0:01:0 64.0KB:34.1GB
                                       0:02:0 64.0KB:34.1GB
                                       0:03:0 64.0KB:34.1GB
                                       0:04:0 64.0KB:34.1GB
                                       0:05:0 64.0KB:34.1GB
Enclosure
ID (C:ID:L) Fan Power Slot Sensor Door Speaker  Standard Diagnostic
----------- --- ----- ---- ------ ---- -------- -------- ----------
 0  0:06:0   0    2    7     1     0     No     SAF-TE   PASSED
 1  0:06:1   0    0    0     0     0     No     SAF-TE   FAILED
 2  0:06:2   0    0    0     0     0     No     SAF-TE   FAILED
 3  0:06:3   0    0    0     0     0     No     SAF-TE   FAILED
 4  0:06:4   0    0    0     0     0     No     SAF-TE   FAILED
 5  0:06:5   0    0    0     0     0     No     SAF-TE   FAILED
 6  0:06:6   0    0    0     0     0     No     SAF-TE   FAILED
 7  0:06:7   0    0    0     0     0     No     SAF-TE   FAILED



AAC0> enclosure show temperature
Executing: enclosure show temperature

Enclosure
ID (C:ID:L) Sensor Temperature Threshold Status
----------- ------ ----------- --------- --------
 0  0:06:0   0       87 F         120    NORMAL


Is there any other information that I can pull?


More information about the freebsd-scsi mailing list