Interrupt routine usage not shown by top in 8.0

Sun Mar 22 15:06:54 PDT 2009

--- On Wed, 3/18/09, Scott Long <scottl at samsco.org> wrote:

> From: Scott Long <scottl at samsco.org>
> Subject: Re: Interrupt routine usage not shown by top in 8.0
> To: "Barney Cordoba" <barney_cordoba at yahoo.com>
> Cc: "Sam Leffler" <sam at freebsd.org>, current at freebsd.org
> Date: Wednesday, March 18, 2009, 5:25 PM
> On Wed, 18 Mar 2009, Barney Cordoba wrote:
> > --- On Wed, 3/18/09, Scott Long
> <scottl at samsco.org> wrote:
> >>
> >> Filters were introduced into the em driver to get
> around a
> >> problem in
> >> certain Intel chipsets that caused aliased
> interrupts.
> >> That's a
> >> different topic of discussion that you are welcome
> to
> >> search the mail
> >> archives on.  The filter also solves performance
> and
> >> latency problems
> >> that are inherent to the ithread model when
> interrupts are
> >> shared
> >> between multiple devices.  This is especially bad
> when a
> >> high speed
> >> device like em shares an interrupt with a low
> speed device
> >> like usb.
> >> In the course of testing and validating the filter
> work, I
> >> found that
> >> filters caused no degradation in performance or
> excess
> >> context switches,
> >> while cleanly solving the above two problems that
> were
> >> common on
> >> workstation and server class machines of only a
> few years
> >> ago.
> >>
> >> However, both of these problems stemmed from using
> legacy
> >> PCI
> >> interrupts.  At the time, MSI was still very new
> and very
> >> unreliable.
> >> As the state of the art progressed and MSI became
> more
> >> reliable, its
> >> use has become more common and is the default in
> several
> >> drivers.  The
> >> igb and ixgbe drivers and hardware both prefer MSI
> over
> >> legacy
> >> interrupts, while the em driver and hardware still
> has a
> >> lot of legacy
> >> hardware to deal with.  So when MSI is the
> >> common/expected/default case,
> >> there is less of a need for the filter/taskqueue
> method.
> >>
> >> Filters rely on the driver being able to reliably
> control
> >> the interrupt
> >> enable state of the hardware.  This is possible
> with em
> >> hardware, but
> >> not as reliable with bge hardware, so the stock
> driver code
> >> does not
> >> have it implemented.  I am running a
> filter-enabled bge
> >> driver in
> >> large-scale production, but I also have precise
> control
> >> over the
> >> hardware being used.  I also have filter patches
> for the
> >> bce driver, but
> >> bce also tends to  prefer MSI, so there isn't
> a
> >> compelling reason to
> >> continue to develop the patches.
> >>
> >>
> >> Scott
> >
> > Assuming same technique is used within an ithread as
> with a fast
> > interrupt, that is:
> >
> > filtered_foo(){
> >   taskqueue_enqueue();
> >   return FILTER_HANDLED;
> > }
> 
> This will give you two context switches, one for the actual
> interrupt, and 
> one for the taskqueue.  It'll also encounter a spinlock
> in the taskqueue 
> code, and a spinlock or two in the scheduler.
> 
> >
> > ithread_foo(){
> >   taskqueue_enqueue();
> >   return;
> > }
> >
> > Is there any additional overhead/locking in the
> ithread method? I'm
> > looking to get better control over cpu distribution.
> >
> 
> This will give you 3 context switches.  First one will be
> for the actual 
> interrupts.  Second one will be for the ithread (recall
> that ithreads are 
> full process contexts and are scheduled as such).  Third
> one will be for 
> the taskqueue.  Along with the spinlocks for the scheduler
> and taskqueue 
> code mentioned above, there will also be spinlocks to
> protect the APIC 
> registers, as well as extra bus cycles to service the APIC.
> 
> So, that's 2 trips through the scheduler, plus the
> associated spinlocks, 
> plus the overhead of going through the APIC code, whereas
> the first method
> only goes through the scheduler once.  Both will have a
> context switch to
> service the low-level interrupt.  The second method will
> definitely have 
> more context switches, and will almost certainly have
> higher overall 
> service latency and CPU usage.
> 
> Scott

Scott, I'm sure you're going to yell at me, but here I go anyway.

I set up a little task that basically does:

foo_task(){

while(1){
  foo_doreceive();
  pause("foo",1);
}

}

which wakes hz times per second in 7 and hz/2 times per second in
8. The same accounting issue exists for this case, as I have it bridging
400K pps and usage shows 0 most of the time. I've added some firewall 
rules which should substantially increase the load, but still no usage.
If I really hammer it, like 600Kpps, it starts registering 30% usage,
with no ramp up in between. I suppose it could be just falling out of the
cache or something, but it doesn't seem realistic.

Is there some hack I can implement to make sure a task is 
accounted for, or some other way to monitor its usage?

Barney