Probable drive failure not recognized by ZFS on mps(4)
Leon Meßner
l.messner at physik.tu-berlin.de
Sat May 5 23:50:58 UTC 2012
Hi,
running 9-STABLE from 2 weeks ago i'm having a problem where ZFS is not
recognizing a failing SATA disk on an LSI SAS2x36 expander. The gnop(8)
device in the zpool status output is for testing purpose. ZFS fails
those alright. What could i do to check if the SCSI sense code actually
makes sense for this drive ?
Thanks,
Leon
uname :
FreeBSD fred.physik-pool.tu-berlin.de 9.0-STABLE FreeBSD 9.0-STABLE #0: Wed Apr 18 20:05:08 CEST 2012
master at fred.physik-pool.tu-berlin.de:/usr/obj/usr/src/sys/GENERIC amd64
/var/log/messages (a lot of this and similar):
May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab a3 1 0 length 512 SMID 809 terminated ioc 804b scsi 0 state 0 xfer 0
May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab a4 1 0 length 512 SMID 633 terminated ioc 804b scsi 0 state 0 xfer 0
May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e af 31 1 0 length 512 SMID 253 terminated ioc 804b scsi 0 state 0 xfer 0
May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 5 79 c2 a6 0 0 1 0
May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): CAM status: SCSI Status Error
May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): SCSI status: Check Condition
May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
May 6 01:32:53 fred kernel: (da17:mps0:0:26:0): Info: 0x579c2a6
May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(6). CDB: 8 e ab ee 1 0 length 512 SMID 344 terminated ioc 804b scsi 0 state 0 xfer 0
May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 3a 38 3c 10 0 0 10 0 length 8192 SMID 304 terminated ioc 804b scsi 0 state 0 xfer 0
May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 3a 38 3a 10 0 0 10 0 length 8192 SMID 712 terminated ioc 804b scsi 0 state 0 xfer 0
May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): READ(10). CDB: 28 0 5 79 c2 56 0 0 46 0
May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): CAM status: SCSI Status Error
May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): SCSI status: Check Condition
May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
May 6 01:32:58 fred kernel: (da17:mps0:0:26:0): Info: 0x579c298
smartctl -a /dev/da17 (excerpt):
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 163 163 051 Pre-fail Always - 929442
3 Spin_Up_Time 0x0027 238 238 021 Pre-fail Always - 1083
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 40
5 Reallocated_Sector_Ct 0x0033 174 174 140 Pre-fail Always - 207
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 4077
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 38
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 33
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 40
194 Temperature_Celsius 0x0022 118 104 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 207
197 Current_Pending_Sector 0x0032 184 183 000 Old_age Always - 1342
198 Offline_Uncorrectable 0x0030 186 183 000 Old_age Offline - 1168
199 UDMA_CRC_Error_Count 0x0032 200 199 000 Old_age Always - 9
200 Multi_Zone_Error_Rate 0x0008 001 001 000 Old_age Offline - 397969
zpool status:
# zpool status
pool: POOL
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat May 5 23:55:44 2012
606G scanned out of 3.22T at 104M/s, 7h23m to go
2.26G resilvered, 18.38% done
config:
NAME STATE READ WRITE CKSUM
POOL DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
gpt/port0-2035c2485 ONLINE 0 0 0
gpt/port2-0565e5416 ONLINE 0 0 0
gpt/port4-200162460 ONLINE 0 0 0
gpt/port6-2556b79f8 ONLINE 0 0 0
gpt/port8-2aac22cb4 ONLINE 0 0 0
gpt/port10-2aac226d2 ONLINE 0 0 0
gpt/port12-0ad6e26d8 ONLINE 0 0 0
gpt/port14-2b0024fed ONLINE 0 0 10 (resilvering)
gpt/port16-2afc39a37 ONLINE 0 0 0
gpt/port18-2556b7770 ONLINE 0 0 0
raidz2-1 DEGRADED 0 0 0
gpt/port1-2acfb0988 ONLINE 0 0 0
gpt/port3-202b5e684 ONLINE 0 0 0
gpt/port5-2025090a1 ONLINE 0 0 0
gpt/port7-2557e4c7a ONLINE 0 0 0
gpt/port9-2adcaf4a5 ONLINE 0 0 0
gpt/port11-2acfb6ab3 ONLINE 0 0 0
gpt/port13-2afc67e75 ONLINE 0 0 0
gpt/port15-25aaca07f ONLINE 0 0 0
gpt/port17-2ad60c96d ONLINE 0 0 40 (resilvering)
replacing-9 OFFLINE 0 0 0
2488369476163776260 OFFLINE 0 0 0 was /dev/da19p1
da19p1.nop ONLINE 0 0 0 (resilvering)
errors: No known data errors
More information about the freebsd-stable
mailing list