Re: git: c01af41c3c8f - main - ata_da: add quirk to disable NCQ TRIM for Samsung 860/870 SSDs

From: Andriy Gapon <avg_at_freebsd.org>
Date: Mon, 19 Feb 2024 16:40:02 UTC
On 19/02/2024 18:06, cglogic wrote:
> Hello Warner,
> 
> Thank you for detailed explanation. Looks like it has no sense for me to worry 
> about disabling NCQ TRIM on that old PC.
> 
> However it would be great to have possibility to apply such hardware bug 
> workarounds for affected controller/drive pairs only.
> Or, at least, have possibility to disable such quirks on per drive basis via 
> sysctl. I didn't find such a possibility.

Yeah, we do not have a way to control ata_da (and scsi_da)  quirks via 
configuration.  It would be nice to have it, indeed.

Also, perhaps what ADA_Q_NCQ_TRIM_BROKEN could do is prefer a different delete 
method by default.  That way, NCQ_DSM_TRIM option would not disappear completely 
and it would be possible to set with kern.cam.ada.N.delete_method sysctl.

> On Monday, February 19th, 2024 at 5:18 PM, Warner Losh <imp@bsdimp.com> wrote:
>>
>>
>> On Mon, Feb 19, 2024, 4:29 AM cglogic <cglogic@protonmail.com 
>> <mailto:cglogic@protonmail.com>> wrote:
>>
>>     On Monday, February 19th, 2024 at 1:03 PM, Andriy Gapon <avg@FreeBSD.org>
>>     wrote:
>>
>>     > On 19/02/2024 12:47, cglogic wrote:
>>     >
>>     > > Hello Andriy,
>>     > >
>>     > > I use ZFS with autotrim enabled on Samsung 860 PRO connected to Intel
>>     AHCI SATA controller for 9 years without any issue.
>>     >
>>     >
>>     > I think that it was released in 2018?
>>
>>     You are right, it's 9 years old computer, SSD was added later.
>>
>>     >
>>     > > Can I disable this quirk locally or have I revert the patch to
>>     continue use NCQ TRIM with this SSD?
>>     >
>>     >
>>     > I don't think that there is a way to disable the quirk, so you'll have
>>     to revert.
>>     > Note that ATA TRIM is still enabled, it's only NCQ TRIM that got disabled.
>>
>>     From my understanding ATA TRIM is non-queued TRIM as opposed to NCQ TRIM.
>>     With only ATA TRIM enabled, drive will have increased latency when large
>>     amount of files deleted.
>>
>>     However I'm not sure how ZFS handles TRIMs, maybe it groups several TRIMs
>>     and then sends when drive is idle.
>>     Another question is how this affects TRIM enabled swap partition.
>>
>>     In general, if I'm correct here, disabling NCQ TRIM should not be an issue
>>     for desktop usage, except for large git repository updates, etc.
>>
>>
>> In general, it won't matter a ton. Unless you are seeing stalls because of 
>> read/write I/O waiting behind trims that have to go to the drive one at a 
>> time, the effects are in the noise. If you are seeing that, though, the 
>> effects can be quite helpful in terms of latency... assuming the drive is 
>> cooperative. The benefits of it vary a lot from drive to drive as well. Some 
>> early drives still serialize the trims, so the latency benefit is not so 
>> great. Some enterprise drives do trims instantly and keep some internal state 
>> that is helped a lot by them. But if NCQ trims are unstable, we likely should 
>> err on the side of disabling.
>>
>> The errors that Andriy was seeing typically are cable or controller issues. In 
>> his case, it was a controller issue (apparently, and surprisingly, paired with 
>> a specific type of drive). Linux has a specific workaround for the pair, which 
>> is unusual. But there's two issues at play.
>>
>> The first issue is that these drives should have NCQ Trim turned off, 
>> according to Linux's commit log. This may be firmware revision based or may 
>> just be some people are luckier with newer firmware (it's hard to say from the 
>> mixed reports, but they read more like luck than version for the sampling I 
>> did). So Andriy's change is good. The second issue, and one I was confused on 
>> earlier, is that Linux goes a step further and disables NCQ entirely for these 
>> old controllers for 860 and 870 drives. This isn't too surprising. We have a 
>> workaround for PMP for these controllers already.... One can disable NCQ 
>> entirely by setting the tags to 0... Given the specificity of the problem, and 
>> the age of the controllers, I'm inclined to not implement the Linux workaround 
>> for this specific pair.


-- 
Andriy Gapon