cvs commit: src/sys/kern kern_intr.c src/sys/sys interrupt.h
    John Baldwin 
    jhb at freebsd.org
       
    Wed May  2 17:28:41 UTC 2007
    
    
  
On Wednesday 02 May 2007 12:36:57 pm Nate Lawson wrote:
> Nate Lawson wrote:
> > John Baldwin wrote:
> >> On Wednesday 02 May 2007 03:07:07 am Darren Reed wrote:
> >>> On Wed, May 02, 2007 at 06:15:13AM +0000, Nate Lawson wrote:
> >>>> njl         2007-05-02 06:15:13 UTC
> >>>>
> >>>>   FreeBSD src repository
> >>>>
> >>>>   Modified files:        (Branch: RELENG_6)
> >>>>     sys/kern             kern_intr.c 
> >>>>     sys/sys              interrupt.h 
> >>>>   Log:
> >>>>   MFC: rate-check the interrupt storm message and bump the counter 
500 -> 
> >> 1000
> >>> Is this number, "500" or "1000" somehow "magical" for modern hardware?
> >>>
> >>> If I had a 500MHZ, 1GHz, 1.5GHz, 2GHz, 2.5GHz machines, each with the
> >>> appropriate architecture, what would the correct value for this be?
> >>> Is i always 1000 or should it be calculated?
> >> It's a SWAG and tunable for machines where it doesn't work.  In practice 
the 
> >> old setting seemed to be a bit too trigger-happy as I know my printer 
always 
> >> triggered it, for example.
> >>
> > 
> > There's more to it than just your Ghz number.  It's a counter of the
> > number of times an interrupt has triggered while the previous one was
> > being serviced.  The faster your kernel, the lower the number could be.
> > 
> > I have a slow early SMP Celeron system with a dc(4) adapter with 4 ports
> > sharing an irq with my ata.  At 3 am, the nightly script kicks off
> > enough IO that it triggers a bug in my dc(4) card that causes it to mask
> > the interrupt too long.  Then, the irq storm suppression logic kicked
> > in, causing ata to timeout the request.  The drive is on a mirror so I'd
> > lose half the mirror, then rebuild in the morning.  With this value
> > bumped, I don't have that problem any more but the real issue is why
> > dc(4) is being so quirky under heavy shared irq load.
> > 
> 
> This is on 6.x btw.  Is there any reason why our retries is so low?
> 
> sys/dev/ata/ata-disk.c:    request->retries = 2;
At work we up the timeout from 5 to 30, but we leave retries at 2.
> Note that I still got a timeout but it succeeded without error.  I think
> this is a combination of the dc(4) and highpoint hpt366 driver
> interaction.  dc(4) is probably holding Giant or something too long and
> ata is being too sensitive to the slow hw.
Neither dc(4) nor ata(4) hold Giant, FWIW.
-- 
John Baldwin
    
    
More information about the cvs-src
mailing list