[Bug 277992] mpr and possible trim issues
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 277992] mpr and possible trim issues"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 27 Mar 2024 16:00:06 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277992 Bug ID: 277992 Summary: mpr and possible trim issues Product: Base System Version: 14.0-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: mike@sentex.net The thread https://lists.freebsd.org/archives/freebsd-hardware/2024-March/000094.html has most of the details. In summary, a set of WD Blue SA510 SSDs with the latest firmware as of Mar 2024 will eventually start throwing errors and detach from the controller when I copy and then destroy a zfs dataset with several million files. It sort of feels like a TRIM issue, but not sure. Putting the disks off the onboard SATA controller does not recreate the issue. If I start with a low level trim (trim -f /dev/daX), create a raidz1 zfs pool with 4, one TB WD disks, import a dataset of about 280GB (compressed) that has many (20+mill files), do a zfs send original pool | zfs recv copy-of-pool, then zfs destroy copy-of-pool and repeat about 4 or 5 times, the drives in the pool will start throwing errors. If I do a hard trim of the disks, I can start from scratch and again get 4 or 5 cycles before the errors. Hence, it sort of feels like a broken trim issue ? I tried with auto trim on and off, a manual zfs trim <pool> between zfs send| zfs recv tests to no avail. When the disks are on the mpr controller I will get errors such as (da6:mpr0:0:16:0): READ(10). CDB: 28 00 6d e0 ae 28 00 00 08 00 (da6:mpr0:0:16:0): CAM status: CCB request completed with an error (da6:mpr0:0:16:0): Retrying command, 3 more tries remain (da6:mpr0:0:16:0): WRITE(10). CDB: 2a 00 0c cb 3f 00 00 00 e8 00 (da6:mpr0:0:16:0): CAM status: CCB request completed with an error (da6:mpr0:0:16:0): Retrying command, 3 more tries remain (da6:mpr0:0:16:0): READ(10). CDB: 28 00 6d e0 ad 28 00 01 00 00 (da6:mpr0:0:16:0): CAM status: CCB request completed with an error (da6:mpr0:0:16:0): Retrying command, 3 more tries remain (da6:mpr0:0:16:0): READ(10). CDB: 28 00 6d e0 ac 28 00 00 f8 00 (da6:mpr0:0:16:0): CAM status: CCB request completed with an error (da6:mpr0:0:16:0): Retrying command, 3 more tries remain (da6:mpr0:0:16:0): WRITE(10). CDB: 2a 00 40 07 df 88 00 01 00 00 (da6:mpr0:0:16:0): CAM status: CCB request completed with an error (da6:mpr0:0:16:0): Retrying command, 3 more tries remain (da6:mpr0:0:16:0): WRITE(10). CDB: 2a 00 3f 48 72 08 00 01 00 00 (da6:mpr0:0:16:0): CAM status: SCSI Status Error (da6:mpr0:0:16:0): SCSI status: Check Condition (da6:mpr0:0:16:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da6:mpr0:0:16:0): Retrying command (per sense data) mpr0: Controller reported scsi ioc terminated tgt 15 SMID 2036 loginfo 31110f00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 637 loginfo 31110f00 (da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 41 98 42 00 00 01 00 00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 1242 loginfo 31110f00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 979 loginfo 31110f00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 1243 loginfo 31110f00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 2091 loginfo 31110f00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 1612 loginfo 31110f00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 2093 loginfo 31110f00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 152 loginfo 31110f00 mpr0: Controller reported scsi ioc terminated tgt 15 SMID 2132 loginfo 31110f00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 43 17 dc 88 00 01 00 00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 41 98 43 00 00 00 50 00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 0c d4 f6 80 00 00 68 00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 0c d4 f5 80 00 01 00 00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): READ(10). CDB: 28 00 05 dc 12 28 00 00 f8 00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): READ(10). CDB: 28 00 05 dc 0f b0 00 00 88 00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 02 96 7e 80 00 00 10 00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): READ(10). CDB: 28 00 6f 5b 8d 68 00 01 00 00 (da5:mpr0:0:15:0): CAM status: CCB request completed with an error (da5:mpr0:0:15:0): Retrying command, 3 more tries remain (da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 41 98 42 00 00 01 00 00 (da5:mpr0:0:15:0): CAM status: SCSI Status Error (da5:mpr0:0:15:0): SCSI status: Check Condition (da5:mpr0:0:15:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da5:mpr0:0:15:0): Retrying command (per sense data) The same tests with Samsung disks work without issue or at least I was not able to recreate the error. # mprutil show adapter mpr0 Adapter: Board Name: INSPUR 3008IT Board Assembly: INSPUR Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 18.00.00.00 Firmware Revision: 16.00.12.00 Integrated RAID: no SATA NCQ: ENABLED PCIe Width/Speed: x8 (8.0 GB/sec) IOC Speed: Full Temperature: 56 C I originally ran into this problem with the same series of LSI adapter, but it was not in IT mode and instead was using the mrsas driver. When on the ATA controller the disks are DSM_TRIM. When on MPR, they are ATA_TRIM. -- You are receiving this mail because: You are the assignee for the bug.