svn commit: r322863 - head/sys/cam

Fri Aug 25 18:18:36 UTC 2017

On Fri, Aug 25, 2017 at 11:59 AM, Rodney W. Grimes <
freebsd at pdx.rh.cn85.dnsmgr.net> wrote:

> [ Charset UTF-8 unsupported, converting... ]
> > On Fri, Aug 25, 2017 at 7:35 AM, Slawa Olhovchenkov <slw at zxy.spb.ru>
> wrote:
> >
> > > On Thu, Aug 24, 2017 at 10:11:10PM +0000, Warner Losh wrote:
> > >
> > > > Author: imp
> > > > Date: Thu Aug 24 22:11:10 2017
> > > > New Revision: 322863
> > > > URL: https://svnweb.freebsd.org/changeset/base/322863
> > > >
> > > > Log:
> > > >   Expand the latency tracking array from 1.024s to 8.192s to help
> track
> > > >   extreme outliers from dodgy drives. Adjust comments to reflect
> this,
> > > >   and make sure that the number of latency buckets match in the two
> > > >   places where it matters.
> > >
> > > May be up to 1min?
> > >
> >
> > I'm not sure what additional data you'll get between "totally sucks, at
> > least 8s latency" and "totally sucks, at least 32s." or "totally sucks,
> at
> > least 64s" though the latter isn't possible with the default timeouts...
> >
> > I'm planning on adding a 'max' latency that's self-resetting instead of
> > expanding the bucket upwards. I'm also contemplating expanding it down to
> > 100us or even 10us since nda serves nvme drives which easily can be sub
> > 100us.
> >
> > Warner
>
> What about using a log2/log10 engineering style binning of
> 1, 2, 4, 8 us
> 10, 20, 40, 80 us
> 100, 200, 400, 800 us
> ...
> 10000000, 20000000, 40000000, 80000000 us
>
> This would give you a fairly fine grain in the high speed
> area and cource grain in the not very likely areas, and
> it all fits in a nice 32 ints.
>

I don't like that at all. it's the worst of both worlds.

1/2/5 makes more sense because the difference between 8 and 10 is tiny, and
there's an extra bin per 1000 with your proposal.
However, powers of two is completely sufficient to get the data out of the
system and is optimal for fitting the fewest bins.
Doing either your suggestion or 1/2/5 makes the bins non-uniform, which
makes the P99 estimates I'm making from these numbers less accurate
mathematically.
More bins makes this more expensive since this is a linear search which we
do on each I/O.
Power of two matches dtrace (though ms instead of us, which I may change).

So I'm not inclined to make arbitrary changes here based on aesthetics. I
don't see a good reason to do so, and I see only extra costs (including
retooling code that I've written to consume this) for changing.

Warner