svn commit: r298002 - in head/sys: cam cam/ata cam/scsi conf dev/ahci

Thu Apr 14 22:24:46 UTC 2016

On Thu, Apr 14, 2016 at 4:15 PM, Shawn Webb <shawn.webb at hardenedbsd.org>
wrote:

> On Thu, Apr 14, 2016 at 04:04:27PM -0600, Warner Losh wrote:
> > On Thu, Apr 14, 2016 at 3:54 PM, Dmitry Morozovsky <marck at rinet.ru>
> wrote:
> >
> > > Warner,
> > >
> > > On Thu, 14 Apr 2016, Warner Losh wrote:
> > >
> > > > Author: imp
> > > > Date: Thu Apr 14 21:47:58 2016
> > > > New Revision: 298002
> > > > URL: https://svnweb.freebsd.org/changeset/base/298002
> > > >
> > > > Log:
> > > >   New CAM I/O scheduler for FreeBSD. The default I/O scheduler is the
> > > same
> > >
> > > [snip]
> > >
> > > First, thanks so much for this quite a non-trivial work!
> > > What are the ways to enable this instead of deafult, and what ar the
> > > benefits
> > > and drawbacks?
> >
> >
> > You add CAM_NETFLIX_IOSCHED to your kernel config to enable it. Hmmm,
> > looking at the diff, perhaps I should add that to LINT.
> >
> > In production, we use it for three things. First, our scheduler keeps a
> lot
> > more
> > statistics than the default one. These statistics are useful for us
> knowing
> > when
> > a system is saturated and needs to shed load. Second, we favor reads over
> > writes because our workload, as you might imagine, is a read mostly work
> > load.
> > Finally, in some systems, we throttle the write throughput to the SSDs.
> The
> > SSDs
> > we buy can do 300MB/s write while serving 400MB/s read, but only for
> short
> > periods of time (long enough to do 10-20GB of traffic). After that, write
> > performance
> > drops, and read performance goes out the window. Experiments have shown
> that
> > if we limit the write speed to no more than 30MB/s or so, then the
> garbage
> > collection the drive is doing won't adversely affect the read latency /
> > performance.
>
> Going on a tangent here, but related:
>
> As someone who is just barely stepping into the world of benchmarks and
> performance metrics, can you shed some light as to how you gained those
> metrics? I'd be extremely interested to learn.
>

These numbers were derived through an iterative process. All our systems
report
a large number of statistics while they are running. The disk performance
numbers
come from gstat(8) which ultimately derives them from devstat(9). When we
enabled
serving customer traffic while refreshing content, we noticed a large
number of
reports from our playback clients indicating problems with the server
during this
time period. I looked at the graphs to see what was going on. Once I found
the problem,
I was able to see that as the write load varied, the latency numbers for
the reads
would vary substantially as well. I added code to the I/O scheduler so I
could rate
limit the write speed to the SSDs. After running through a number of
different
machines over a number of nights of filling and serving, I was able to find
the right
number. If I set it to 30MB, the 20 machines I tested didn't have any
reports above
background level of problems. When I set it to 35MB/s there was a couple of
those
machines that had problems. when I set it to 40MB/s there were a couple
more. When
I set it to 80MB/s, almost all had problems. Being conservative, I set it
to the highest
number that showed no ill effect on the clients. I was able to see large
jumps in read
latency as low as 25MB/s though. Sadly, this is with Netflix internal
tools, but one
could do the same research with gstat and scripting. One could also use
dtrace
to study the latency patterns to a much finer degree of fidelity than gstat
offers.

Warner