svn commit: r355837 - head/sys/cam

Warner Losh imp at bsdimp.com
Tue Dec 17 03:06:55 UTC 2019


On Mon, Dec 16, 2019 at 5:43 PM Steven Hartland <
steven.hartland at multiplay.co.uk> wrote:

> Sticky keyboard there Warner?
>

Yea. I thought I'd fixed all those messages.


> On a more serious note the fact that the controllers lie about the
> underlying
> location of data, the impact of skipping the TRIM requests can have a
> much more
> serious impact than one might think depending on the drive, so this type of
> optimisation can significantly harm performance instead of increasing it.
>

The underlying location of an LBA is never fixed. There's always a
intermediate layer that does the mapping so you can get good performance
writing to the drive in chunks smaller than the erase block. A trim
followed quickly by a write is nearly the same as just doing the write. In
the cases where we have a severe shortage, we'll skip the trim to a block
that's likely to be written to shortly anyway. It's only done when there's
a shortage, and it only affects the trims that we've not yet scheduled.


> This was the main reasons we sponsored the initial ZFS TRIM
> implementation; as
> drive performance go so bad with no TRIM that SSD's performed worse than
> HDD's.
>

Yes. You can't omit trims generally, but when the choice is to freeze all
the other I/O in the system and do every single trim, or skip a few trims
to allow the I/O to flow more quickly, this is an obvious tradeoff. The
reason you want to generally do trim is to keep the write amp on the drive
down. Skipping it occasionally, especially for blocks that will soon be
written, is fine. It won't materially affect the write amp. Turning it off
completely will affect the write amp terribly, on some drive, as you
discovered as the underlying nand pool of blocks get fragmented.

Now obviously this was some time ago, but I wouldn't be surprised if
> there's bad
> hardware / firmware like this still being produced.
>

I've never encountered firmware so bad that this would cause it any
trouble, despite encountering a lot of bad firmware out there... DSM TRIMs
are optional, and nothing requires that they be done. If you do the
overwhelming majority but fail to do so for a few trims, the drive will
still behave decently even with really bad firmware and garbage collection
routines.


> Given that might be a good idea to make this optional, possibly even opt
> in not opt
> out?
>

It's 100% off if you don't use the I/O scheduler with option IOSCHED. Since
the cam io scheduler can delay I/O, this helps when there's a severe
resource shortage in the system (which happens somewhat infrequently). As
such, I viewed it as part of IOSCHED and not something needing a knob to
turn off separately. I had a hard time triggering the shortages that would
exercise the speedup path, and the code review moved the speedup code to an
even less frequent location.

Warner


>      Regards
>      Steve
>
> On 17/12/2019 00:13, Warner Losh wrote:
> > Author: imp
> > Date: Tue Dec 17 00:13:45 2019
> > New Revision: 355837
> > URL: https://svnweb.freebsd.org/changeset/base/355837
> >
> > Log:
> >    Implement bio_speedup
> >
> >    React to the BIO_SPEED command in the cam io scheduler by completing
> >    as successful BIO_DELETE commands that are pending, up to the length
> >    passed down in the BIO_SPEEDUP cmomand. The length passed down is a
> >    hint for how much space on the drive needs to be recovered. By
> >    completing the BIO_DELETE comomands, this allows the upper layers to
> >    allocate and write to the blocks that were about to be trimmed. Since
> >    FreeBSD implements TRIMSs as advisory, we can eliminliminate them and
> >    go directly to writing.
> >
> >    The biggest benefit from TRIMS coomes ffrom the drive being able t
> >    ooptimize its free block pool inthe log run. There's little nto no
> >    bene3efit in the shoort term. , sepeciall whn the trim is followed by
> >    a write. Speedup lets  us make this tradeoff.
> >
> >    Reviewed by: kirk, kib
> >    Sponsored by: Netflix
> >    Differential Revision: https://reviews.freebsd.org/D18351
> >
> > Modified:
> >    head/sys/cam/cam_iosched.c
> >
> > Modified: head/sys/cam/cam_iosched.c
> >
> ==============================================================================
> > --- head/sys/cam/cam_iosched.c        Tue Dec 17 00:13:40 2019
> (r355836)
> > +++ head/sys/cam/cam_iosched.c        Tue Dec 17 00:13:45 2019
> (r355837)
> > @@ -1534,6 +1534,41 @@ cam_iosched_queue_work(struct cam_iosched_softc
> *isc,
> >   {
> >
> >       /*
> > +      * A BIO_SPEEDUP from the uppper layers means that they have a
> block
> > +      * shortage. At the present, this is only sent when we're trying to
> > +      * allocate blocks, but have a shortage before giving up.
> bio_length is
> > +      * the size of their shortage. We will complete just enough
> BIO_DELETEs
> > +      * in the queue to satisfy the need. If bio_length is 0, we'll
> complete
> > +      * them all. This allows the scheduler to delay BIO_DELETEs to
> improve
> > +      * read/write performance without worrying about the upper layers.
> When
> > +      * it's possibly a problem, we respond by pretending the
> BIO_DELETEs
> > +      * just worked. We can't do anything about the BIO_DELETEs in the
> > +      * hardware, though. We have to wait for them to complete.
> > +      */
> > +     if (bp->bio_cmd == BIO_SPEEDUP) {
> > +             off_t len;
> > +             struct bio *nbp;
> > +
> > +             len = 0;
> > +             while (bioq_first(&isc->trim_queue) &&
> > +                 (bp->bio_length == 0 || len < bp->bio_length)) {
> > +                     nbp = bioq_takefirst(&isc->trim_queue);
> > +                     len += nbp->bio_length;
> > +                     nbp->bio_error = 0;
> > +                     biodone(nbp);
> > +             }
> > +             if (bp->bio_length > 0) {
> > +                     if (bp->bio_length > len)
> > +                             bp->bio_resid = bp->bio_length - len;
> > +                     else
> > +                             bp->bio_resid = 0;
> > +             }
> > +             bp->bio_error = 0;
> > +             biodone(bp);
> > +             return;
> > +     }
> > +
> > +     /*
> >        * If we get a BIO_FLUSH, and we're doing delayed BIO_DELETEs then
> we
> >        * set the last tick time to one less than the current ticks minus
> the
> >        * delay to force the BIO_DELETEs to be presented to the client
> driver.
> > @@ -1919,8 +1954,8 @@ DB_SHOW_COMMAND(iosched, cam_iosched_db_show)
> >       db_printf("Trim Q len         %d\n", biolen(&isc->trim_queue));
> >       db_printf("read_bias:         %d\n", isc->read_bias);
> >       db_printf("current_read_bias: %d\n", isc->current_read_bias);
> > -     db_printf("Trims active       %d\n", isc->pend_trim);
> > -     db_printf("Max trims active   %d\n", isc->max_trim);
> > +     db_printf("Trims active       %d\n", isc->pend_trims);
> > +     db_printf("Max trims active   %d\n", isc->max_trims);
> >   }
> >   #endif
> >   #endif
>
>


More information about the svn-src-all mailing list