Re: Detecting failing drives - ZFS carries on regardless

From: Frank Leonhardt <freebsd-doc_at_fjl.co.uk>
Date: Tue, 18 Feb 2025 13:23:17 UTC
As an example, the following is NOT ENOUGH for ZFS to fail a drive, 
because FreeBSD doesn't offline it. Would you want a drive like this in 
a production environment?

Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 02 
c1 38 00 00 08 00 00
Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: RECOVERED 
ERROR asc:c,1 (Write error - recovered with auto reallocation)
Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x2c13f88
Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 02 
c1 40 00 00 08 00 00
Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: RECOVERED 
ERROR asc:c,1 (Write error - recovered with auto reallocation)
Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x2c146c0
Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 02 
c1 48 00 00 08 00 00
Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: RECOVERED 
ERROR asc:c,1 (Write error - recovered with auto reallocation)
Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x2c14ec8
Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 02 
c1 50 00 00 08 00 00
Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: RECOVERED 
ERROR asc:c,1 (Write error - recovered with auto reallocation)
Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x2c15700
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 
32 30 00 00 08 00 00
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE 
FAILURE asc:3,0 (Peripheral device write fault)
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x33236f6
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 
0x10100100
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23
Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per 
sense data)
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 
32 a8 00 00 08 00 00
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE 
FAILURE asc:3,0 (Peripheral device write fault)
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a9b6
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 
0x10100100
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23
Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per 
sense data)
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 
32 a8 00 00 08 00 00
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE 
FAILURE asc:3,0 (Peripheral device write fault)
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a94a
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 
0x10100100
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23
Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per 
sense data)
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 
32 a8 00 00 08 00 00
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE 
FAILURE asc:3,0 (Peripheral device write fault)
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a800
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 
0x10100100
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23
Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per 
sense data)
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 
32 a8 00 00 08 00 00
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE 
FAILURE asc:3,0 (Peripheral device write fault)
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a800
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 
0x10100100
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23
Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per 
sense data)
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 
32 a8 00 00 08 00 00
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status 
Error
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE 
FAILURE asc:3,0 (Peripheral device write fault)
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a824
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 
0x10100100
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23
Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Error 5, Retries exhausted