Re: [List] disk problems with Dell PowerEdge r210 / SEAGATE ST3300656SS HS09

From: Frank Leonhardt <freebsd-doc_at_fjl.co.uk>
Date: Mon, 27 Oct 2025 15:15:57 UTC
On 27/10/2025 11:58, Matthias Apitz wrote:
> Hello,
> Since 2017 I own the above server which my company wanted to
> decomissioned. I use it since then as my bakery for FreeBSD CURRENT and
> ports.
>
> The server has two SCSI harddrives, da0 is UFS for /root, /usr etc. and
> da1 is ZFS used for poudriere:
>
> Oct 27 04:55:34 jet kernel: da1: <SEAGATE ST3300656SS HS09> Fixed Direct Access SPC-3 SCSI device
> Oct 27 04:55:34 jet kernel: da1: Serial Number 3QP1NF96
> Oct 27 04:55:34 jet kernel: da1: 300.000MB/s transfers
> Oct 27 04:55:34 jet kernel: da1: Command Queueing enabled
> Oct 27 04:55:34 jet kernel: da1: 286102MB (585937500 512 byte sectors)
>
> Since some time this disk gives fault like the messages below and only a
> power-off reset help. Here are the last two faults on October 25 and 27.
>
> What could I do as tests or map away disk blocks so they will not be
> touch again?
>
> Thanks
>
> 	matthias
>
> /var/log/messages
>
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 1a 99 75 39 00 00 02 00
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): CAM status: SCSI Status Error
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): SCSI status: Check Condition
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,cd (Vendor Specific ASCQ)
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): Info: 0x22c0f7
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): Field Replaceable Unit: 204
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): Retrying command (per sense data)
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 1a 99 75 39 00 00 02 00
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): CAM status: SCSI Status Error
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): SCSI status: Check Condition
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): SCSI sense: NOT READY asc:4,1 (Logical unit is in process of becoming ready)
> Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): Polling device for readiness
> Oct 25 16:55:43 jet kernel:     (da1:mps0:0:4:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 105 Command timeout on target 4(0x0009) 5000 set, 5.4367562 elapsed
> Oct 25 16:55:43 jet kernel: mps0: Sending abort to target 4 for SMID 105
>
>
> Oct 27 02:32:19 jet kernel:     (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e 87 fe 7b 00 00 02 00 length 1024 SMID 1471 Command timeout on target 4(0x0009) 60000 set, 60.68865685 elapsed
> Oct 27 02:32:19 jet kernel: mps0: Sending abort to target 4 for SMID 1471
> Oct 27 02:32:19 jet kernel:     (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e 87 fe 7b 00 00 02 00 length 1024 SMID 1471 Aborting command 0xfffffe00c347b8a8
> Oct 27 02:32:20 jet kernel:     (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e 87 ff c5 00 00 02 00 length 1024 SMID 1313 Command timeout on target 4(0x0009) 60000 set, 60.32376095 elapsed
> Oct 27 02:32:20 jet kernel:     (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e b8 20 29 00 01 00 00 length 131072 SMID 1552 Command timeout on target 4(0x0009) 60000 set, 60.118961952 elapsed
> Oct 27 02:32:20 jet kernel:     (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e b8 1f 29 00 01 00 00 length 131072 SMID 650 Command timeout on target 4(0x0009) 60000 set, 60.119515098 elapsed
> Oct 27 02:32:20 jet kernel:     (da1:mps0:0:4:0): WRITE(10). CDB: 2a 00 19 dd 6c 84 00 00 07 00 length 3584 SMID 299 Command timeout on target 4(0x0009) 60000 set, 60.75147185 elapsed
> Oct 27 02:32:20 jet kernel:     (da1:mps0:0:4:0): WRITE(10). CDB: 2a 00 19 dd 6c 7e 00 00 01 00 length 512 SMID 118 Command timeout on target 4(0x0009) 60000 set, 60.75441295 elapsed
>
I still run R200s in production - very reliable! The early 210 had heat 
problems.

SCSI drives (and SATA) do their own bad block mapping so it is no longer 
necessary (or possible) to have a bad block map on the system. It's time 
to get a new drives. They do a very good job of hiding failures from the 
OS so if you are seeing errors you are looking at the "tip of an 
iceberg". A SCSI drive will detect a bad block, recover the data and 
move it to a new area of the disk. This can take a long time, and this 
is a big clue that the drive is failing. However, it returns "OK" if the 
timeout is long enough and it can get the data off.

Regards, Frank.