Re: how to tell if TRIM is working

From: Warner Losh <imp_at_bsdimp.com>
Date: Thu, 02 May 2024 14:34:22 UTC
On Thu, May 2, 2024 at 8:19 AM mike tancsa <mike@sentex.net> wrote:

> On 5/2/2024 10:16 AM, Warner Losh wrote:
>
>
> When trims are fast, you want to send them to the drive as soon as you
> know the blocks are freed. UFS always does this (if trim is enabled at
> all).
> ZFS has a lot of knobs to control when / how / if this is done.
>
> vfs.zfs.vdev.trim_min_active: 1
> vfs.zfs.vdev.trim_max_active: 2
> vfs.zfs.trim.queue_limit: 10
> vfs.zfs.trim.txg_batch: 32
> vfs.zfs.trim.metaslab_skip: 0
> vfs.zfs.trim.extent_bytes_min: 32768
> vfs.zfs.trim.extent_bytes_max: 134217728
> vfs.zfs.l2arc.trim_ahead: 0
>
>
> I've not tried to tune these in the past, but you can see how they affect things.
>
>
> Thanks Warner, I will try and play around with these values to see if they
> impact things.  BTW, do you know what / why things would be "skipped"
> during trim events ?
>
> kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0
> kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0
> kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped: 5968330752
> kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped: 503986
> kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written: 181593186304
> kstat.zfs.zrootoffs.misc.iostats.trim_extents_written: 303115
>

A quick look at the code suggests that it is when the extent to be trimmed
is smaller than the extent_bytes_min parameter.

The minimum seems to be a trade off between too many trims to the drive and
making sure that the trims that you do send are maximally effective. By
specifying a smaller size, you'll be freeing up more holes in the
underlying NAND blocks. In some drives, this triggers more data copying
(and more write amp), so you want to set it a bit higher for those. In
other drivers, it improves the efficiency of the GC algorithm, allowing
each underlying block groomed to recover more space for future writes. In
the past, I've found that ZFS' defaults are decent for 2018ish level of
SATA SSDs, but a bit too trim avoidy for newer nvme drives, even the cheap
consumer ones. Though that's just a coarse generalization from my
buildworld workload. Other work loads will have other data patterns, ymmv,
so you need to measure it.

Another way to get statistics, one that I've not been able to measure a
slowdown from, is to enable CAM_IOSCHED_DYNAMIC. Then you get a lot more
statistics about the I/Os in the system, including latency measurements. In
theory, that also allows one to traffic shape the trims to the drive, but
I've had only limited success with that and haven't had the time to make it
a lot better.

Warner