TRIM, iSCSI and %busy waves

Warner Losh imp at
Fri Apr 6 15:01:54 UTC 2018

On Fri, Apr 6, 2018 at 1:58 AM, Borja Marcos <borjam at> wrote:

> > On 5 Apr 2018, at 17:00, Warner Losh <imp at> wrote:
> >
> > I'm working on trim shaping in -current right now. It's focused on NVMe,
> > but since I'm doing the bulk of it in cam_iosched.c, it will eventually
> be
> > available for ada and da. The notion is to measure how long the TRIMs
> take,
> > and only send them at 80% of that rate when there's other traffic in the
> > queue (so if trims are taking 100ms, send them no faster than 8/s). While
> > this will allow for better read/write traffic, it does slow the TRIMs
> down
> > which slows down whatever they may be blocking in the upper layers. Can't
> > speak to ZFS much, but for UFS that's freeing of blocks so things like
> new
> > block allocation may be delayed if we're almost out of disk (which we
> have
> > no signal for, so there's no way for the lower layers to prioritize trims
> > or not).
> Have you considered "hard" shaping including discarding TRIMs when needed?
> Remember that a TRIM is not a write, which is subject to a contract with
> the application,
> but a better-if-you-do-it operation.

Well, yes and no. TRIM is there to improve performance, in the long term,
of the drives because they'd otherwise get too fragmented and/or have an
unacceptably high write amplification. It's more than just a hint, but
maybe, in some cases, less than a write. Better if you do it does give some
leeway, how much depends on the application. If we were to implement a hard
limit on the latency of TRIMs, it would have to be user configurable.
There's also the strategy of returning some TRIMs right away, while letting
only a percentage through to the device.

If I go through with what you're calling hard shaping, I'd also look for
ways to allow the upper layers to tell me to hurry up. We have it in the
buffer daemon between all the users of bufs when there's a buf shortage,
but no similar signal from UFS down to the device to tell it that the
results are needed NOW vs needed eventually. And the urgency of the need
varies somewhat over time. you could easily send down a boatload of TRIMs
with no urgent need for blocks, time passes, and then you have an urgent
need for blocks. So you can't add something to the bio going down that it's
needed or not since you might not have another TRIM to send down. A new BIO
type and/or a tweak to BIO_FLUSH might suffice and be well defined for
drivers that don't do weird things. The notion of the upper layers being
able to cancel a TRIM that's been queued up was also floated since TRIM +
WRITE in quick succession often gives no different performance than just
the bare WRITE. And I have no clue what ZFS does wrt TRIMs.

So I've considered it, yes. But there's more tricky corners here to
consider if it were to be implemented due to (a) the diversity of quality
in the market place and (b) the diversity of workloads FreeBSD is used for.

> Otherwise, as you say, you might be blocking other operations in the upper
> layers.
> I am assuming here that with many devices doing TRIMs is better than not
> doing them.
> And in case of queue congestion doing *some* TRIMs should be better than
> doing
> no TRIMs at all.
> Yep, not the first time I propose something of the sort, but my queue of
> suggestions
> to eventually discard TRIMs doesn’s implement the same method ;)

I'm looking at all options, to be honest. I'm not sure what will work the
best in the long term.

I've observed that, at least with UFS, it's quite easy to survive for hours
without finishing the trim with millions of TRIMs in the queue. All it
affects are monitoring programs that freak out when you have this many
items in the queue for so long (thinking something must be wrong). Of
coarse, we control the monitoring programs, so that's easy to fix. (I
discovered this when I was doing 1 IOP for TRIMs when running early, buggy
versions of this, btw). The backup causes UFS to be waiting on the blocks,
if there is a block shortage. Since most of the time there's not, this
didn't cause problems when I tweaked a parameter and drained the TRIMs 8
hours after tons of files were deleted....


More information about the freebsd-stable mailing list