Re: Detecting failing drives - ZFS carries on regardless
Date: Tue, 18 Feb 2025 13:23:17 UTC
As an example, the following is NOT ENOUGH for ZFS to fail a drive, because FreeBSD doesn't offline it. Would you want a drive like this in a production environment? Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 02 c1 38 00 00 08 00 00 Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: RECOVERED ERROR asc:c,1 (Write error - recovered with auto reallocation) Feb 17 16:20:48 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x2c13f88 Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 02 c1 40 00 00 08 00 00 Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: RECOVERED ERROR asc:c,1 (Write error - recovered with auto reallocation) Feb 17 16:20:54 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x2c146c0 Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 02 c1 48 00 00 08 00 00 Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: RECOVERED ERROR asc:c,1 (Write error - recovered with auto reallocation) Feb 17 16:21:00 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x2c14ec8 Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 02 c1 50 00 00 08 00 00 Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: RECOVERED ERROR asc:c,1 (Write error - recovered with auto reallocation) Feb 17 16:21:07 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x2c15700 Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 32 30 00 00 08 00 00 Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE FAILURE asc:3,0 (Peripheral device write fault) Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x33236f6 Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8 Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 0x10100100 Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23 Feb 17 16:22:10 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per sense data) Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 32 a8 00 00 08 00 00 Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE FAILURE asc:3,0 (Peripheral device write fault) Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a9b6 Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8 Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 0x10100100 Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23 Feb 17 16:22:13 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per sense data) Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 32 a8 00 00 08 00 00 Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE FAILURE asc:3,0 (Peripheral device write fault) Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a94a Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8 Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 0x10100100 Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23 Feb 17 16:22:14 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per sense data) Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 32 a8 00 00 08 00 00 Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE FAILURE asc:3,0 (Peripheral device write fault) Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a800 Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8 Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 0x10100100 Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23 Feb 17 16:22:15 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per sense data) Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 32 a8 00 00 08 00 00 Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE FAILURE asc:3,0 (Peripheral device write fault) Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a800 Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8 Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 0x10100100 Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23 Feb 17 16:22:16 zfs2 kernel: (da0:mps0:0:14:0): Retrying command (per sense data) Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): WRITE(10). CDB: 2a 00 03 32 a8 00 00 08 00 00 Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): CAM status: SCSI Status Error Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): SCSI status: Check Condition Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): SCSI sense: HARDWARE FAILURE asc:3,0 (Peripheral device write fault) Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Info: 0x332a824 Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Field Replaceable Unit: 8 Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Command Specific Info: 0x10100100 Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Actual Retry Count: 23 Feb 17 16:22:17 zfs2 kernel: (da0:mps0:0:14:0): Error 5, Retries exhausted