Questions about camcontrol, hot-swapping, ciss and Compaq SmartArray
josh at endries.org
Mon Mar 10 23:59:35 UTC 2008
Today I saw that one of my disks seems to be dead/dying in a RAID 5 array I have:
loki.domain.int ciss0: *** Fatal drive error, SCSI port 1 ID 0
loki.domain.int (da1:ciss0:0:1:0): WRITE(10). CDB: 2a 0 c ae 3f d0 0 0 20 0
loki.domain.int (da1:ciss0:0:1:0): CAM Status: SCSI Status Error
loki.domain.int (da1:ciss0:0:1:0): SCSI Status: Check Condition
loki.domain.int (da1:ciss0:0:1:0): MEDIUM ERROR asc:11,0
loki.domain.int (da1:ciss0:0:1:0): Unrecovered read error
loki.domain.int (da1:ciss0:0:1:0): Retrying Command (per Sense Data)
I see messages for port 0 only, but varying ID 0-3, and I'm not sure what that
means (partition?). After a while the error messages "went away", though the
disks were/are still being used. I found cciss_vol_status online but it says the
volume is OK (not degraded), which doesn't really make sense to me:
# cciss_vol_status /dev/ciss0
/dev/ciss0: (Smart Array 642) RAID 0 Volume 0(?) status: OK.
/dev/ciss0: (Smart Array 642) RAID 5 Volume 1(?) status: OK.
Is there a way I can tell which port/disk is bad from these messages?
Assuming I can determine which disk it is, do I need to do anything in the OS
before/after I swap out a drive? I've seen people talk about rescanning and
running other camcontrol commands before...
Any other tips?
More information about the freebsd-questions