CAM disk I/O starvation

Tue Apr 17 19:16:09 UTC 2012

On Mon, 16 Apr 2012 14:39:12 -0700
Adrian Chadd <adrian at freebsd.org> wrote:

> On 11 April 2012 10:21, Gary Jennejohn <gljennjohn at googlemail.com> wrote:
> 
> > Just for the archive my bad disk performance seems to have been fixed in
> > HEAD by svn commit r234074.  Seems that all interrupts were being handled
> > by a single CPU/core (I have 6), which resulted in abysmal interrupt
> > handling when mutltiple disks were busy.
> >
> > Since this commit my disk preformance is back to normal and long lags
> > are a thing of the past.
> 
> Hi,
> 
> This is kind of worrying. You only have a few disks, a single core
> SHOULD be able to handle all the interrupts for those disks whilst
> leaving plenty of cycles to spare to drive the rest of your system.
> And you have 5 other cores.
> 
> Would you be willing to help out diagnose exactly why that particular
> behaviour is causing you so much trouble? It almost sounds like
> something in the IO path is blocking for far too long, not allowing
> the rest of the system to move forward. That's very worrying for an
> interrupt handler. :)
> 

Yes, I agree completely.  My first thought was that disk I/O
scheduling had somehow been pessimized.  But then I thought -
wait a minute, I have disk caches enabled and command queuing is
enabled for all of them, so that shouldn't really have any
noticeable impact.  So I was at a loss to explain why disk performance
had suddenly gotten so bad.

I'd be willing to spend some time on diagnosing it, but I have to come
up with a scenario which would reliably reproduce the problem.  AFAICR
it generally happened when I was running csup/svn because my CVS
repositoy is on one disk and /usr/{ports,src} are on a different one.

I still have the old problem kernel around, but it's probably not
instrumented for any meaningful diagnoses.

-- 
Gary Jennejohn