Re: [List] disk problems with Dell PowerEdge r210 / SEAGATE ST3300656SS HS09
- In reply to: Matthias Apitz : "disk problems with Dell PowerEdge r210 / SEAGATE ST3300656SS HS09"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 27 Oct 2025 15:15:57 UTC
On 27/10/2025 11:58, Matthias Apitz wrote: > Hello, > Since 2017 I own the above server which my company wanted to > decomissioned. I use it since then as my bakery for FreeBSD CURRENT and > ports. > > The server has two SCSI harddrives, da0 is UFS for /root, /usr etc. and > da1 is ZFS used for poudriere: > > Oct 27 04:55:34 jet kernel: da1: <SEAGATE ST3300656SS HS09> Fixed Direct Access SPC-3 SCSI device > Oct 27 04:55:34 jet kernel: da1: Serial Number 3QP1NF96 > Oct 27 04:55:34 jet kernel: da1: 300.000MB/s transfers > Oct 27 04:55:34 jet kernel: da1: Command Queueing enabled > Oct 27 04:55:34 jet kernel: da1: 286102MB (585937500 512 byte sectors) > > Since some time this disk gives fault like the messages below and only a > power-off reset help. Here are the last two faults on October 25 and 27. > > What could I do as tests or map away disk blocks so they will not be > touch again? > > Thanks > > matthias > > /var/log/messages > > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 1a 99 75 39 00 00 02 00 > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): CAM status: SCSI Status Error > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): SCSI status: Check Condition > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,cd (Vendor Specific ASCQ) > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): Info: 0x22c0f7 > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): Field Replaceable Unit: 204 > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): Retrying command (per sense data) > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 1a 99 75 39 00 00 02 00 > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): CAM status: SCSI Status Error > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): SCSI status: Check Condition > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): SCSI sense: NOT READY asc:4,1 (Logical unit is in process of becoming ready) > Oct 25 16:55:37 jet kernel: (da1:mps0:0:4:0): Polling device for readiness > Oct 25 16:55:43 jet kernel: (da1:mps0:0:4:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 105 Command timeout on target 4(0x0009) 5000 set, 5.4367562 elapsed > Oct 25 16:55:43 jet kernel: mps0: Sending abort to target 4 for SMID 105 > > > Oct 27 02:32:19 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e 87 fe 7b 00 00 02 00 length 1024 SMID 1471 Command timeout on target 4(0x0009) 60000 set, 60.68865685 elapsed > Oct 27 02:32:19 jet kernel: mps0: Sending abort to target 4 for SMID 1471 > Oct 27 02:32:19 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e 87 fe 7b 00 00 02 00 length 1024 SMID 1471 Aborting command 0xfffffe00c347b8a8 > Oct 27 02:32:20 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e 87 ff c5 00 00 02 00 length 1024 SMID 1313 Command timeout on target 4(0x0009) 60000 set, 60.32376095 elapsed > Oct 27 02:32:20 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e b8 20 29 00 01 00 00 length 131072 SMID 1552 Command timeout on target 4(0x0009) 60000 set, 60.118961952 elapsed > Oct 27 02:32:20 jet kernel: (da1:mps0:0:4:0): READ(10). CDB: 28 00 0e b8 1f 29 00 01 00 00 length 131072 SMID 650 Command timeout on target 4(0x0009) 60000 set, 60.119515098 elapsed > Oct 27 02:32:20 jet kernel: (da1:mps0:0:4:0): WRITE(10). CDB: 2a 00 19 dd 6c 84 00 00 07 00 length 3584 SMID 299 Command timeout on target 4(0x0009) 60000 set, 60.75147185 elapsed > Oct 27 02:32:20 jet kernel: (da1:mps0:0:4:0): WRITE(10). CDB: 2a 00 19 dd 6c 7e 00 00 01 00 length 512 SMID 118 Command timeout on target 4(0x0009) 60000 set, 60.75441295 elapsed > I still run R200s in production - very reliable! The early 210 had heat problems. SCSI drives (and SATA) do their own bad block mapping so it is no longer necessary (or possible) to have a bad block map on the system. It's time to get a new drives. They do a very good job of hiding failures from the OS so if you are seeing errors you are looking at the "tip of an iceberg". A SCSI drive will detect a bad block, recover the data and move it to a new area of the disk. This can take a long time, and this is a big clue that the drive is failing. However, it returns "OK" if the timeout is long enough and it can get the data off. Regards, Frank.