cvs commit: src/sys/amd64/amd64 intr_machdep.c io_apic.c local_apic.c mp_machdep.c src/sys/amd64/include apicvar.h intr_machdep.h src/sys/amd64/isa atpic.c src/sys/i386/i386 intr_machdep.c io_apic.c local_apic.c mp_machdep.c ...

Tue Feb 28 14:57:34 PST 2006

On Tuesday 28 February 2006 17:49, Scott Long wrote:
> John Baldwin wrote:
> > On Tuesday 28 February 2006 17:24, John Baldwin wrote:
> > 
> >>jhb         2006-02-28 22:24:55 UTC
> >>
> >>  FreeBSD src repository
> >>
> >>  Modified files:
> >>    sys/amd64/amd64      intr_machdep.c io_apic.c local_apic.c 
> >>                         mp_machdep.c 
> >>    sys/amd64/include    apicvar.h intr_machdep.h 
> >>    sys/amd64/isa        atpic.c 
> >>    sys/i386/i386        intr_machdep.c io_apic.c local_apic.c 
> >>                         mp_machdep.c 
> >>    sys/i386/include     apicvar.h intr_machdep.h 
> >>    sys/i386/isa         atpic.c 
> >>  Log:
> >>  Rework how we wire up interrupt sources to CPUs:
> >>  - Throw out all of the logical APIC ID stuff.  The Intel docs are somewhat
> >>    ambiguous, but it seems that the "flat" cluster model we are currently
> >>    using is only supported on Pentium and P6 family CPUs.  The other
> >>    "hierarchy" cluster model that is supported on all Intel CPUs with
> >>    local APICs is severely underdocumented.  For example, it's not clear
> >>    if the OS needs to glean the topology of the APIC hierarchy from
> >>    somewhere (neither ACPI nor MP Table include it) and setup the logical
> >>    clusters based on the physical hierarchy or not.  Not only that, but on
> >>    certain Intel chipsets, even though there were 4 CPUs in a logical
> >>    cluster, all the interrupts were only sent to one CPU anyway.
> >>  - We now bind interrupts to individual CPUs using physical addressing via
> >>    the local APIC IDs.  This code has also moved out of the ioapic PIC
> >>    driver and into the common interrupt source code so that it can be
> >>    shared with MSI interrupt sources since MSI is addressed to APICs the
> >>    same way that I/O APIC pins are.
> > 
> >     - Use fixed delivery mode rather than low priority, as apparently low
> >       priority mode only works with logical APIC IDs (though this is not
> >       clearly documented that I've seen).  Also, I've observed behavior where
> >       low priority mode will deliver interrupts to a different CPU than the
> >       one you've specifically routed the IRQ to using physical addressing
> >       on certain Intel chipsets.  FYI, we use low priority delivery method
> >       with a wildcard physical address on all released versions of FreeBSD,
> >       back to revision 1.1 of sys/i386/i386/mpapic.c.
> > 
> >>  - Interrupt source classes grow a new method pic_assign_cpu() to bind an
> >>    interrupt source to a specific local APIC ID.
> >>  - The SMP code now tells the interrupt code which CPUs are avaiable to
> >>    handle interrupts in a simpler and more intuitive manner.  For one thing,
> >>    it means we could now choose to not route interrupts to HT cores if we
> >>    wanted to (this code is currently in place in fact, but under an #if 0
> >>    for now).
> >>  - For now we simply do static round-robin of IRQs to CPUs when the first
> >>    interrupt handler just as before, with the change that IRQs are now
> >>    bound to individual CPUs rather than groups of up to 4 CPUs.
> >>  - Because the IRQ to CPU mapping has now been moved up a layer, it would
> >>    be easier to manage this mapping from higher levels.  For example, we
> >>    could allow drivers to specify a CPU affinity map for their interrupts,
> >>    or we could allow a userland tool to bind IRQs to specific CPUs.
> > 
> > 
> > FYI, I think I would prefer the latter, as a sys admin knows that he's
> > routing packets between two different network interfaces and thus wants
> > those devices on separate CPUs (for example), whereas the driver writer
> > isn't really in a position to know that sort of thing.
> > 
> 
> Where this is really useful is for people developing FreeBSD-based
> appliances that have very specific and fixed needs.  Also, it's not so
> much important which CPU gets the interrupt as it is which CPU runs the
> ithread for that interrupt.  I guess that you can get a little better
> latency by preempting directly from the low-level interrupt handler into
> the ithread, but I don't know if that is noticable noise above the cost
> of the context switch and inevitable lock operations and contention
> involved.
> 
> Anyways, thanks for this work, it looks very promising.

I figured that if we do provide a userland tool, that the interface it used
would both bind the IRQ (to bind the low-level handler) as well as the
ithread (using sched_bind()) to the given CPU.

-- 
John Baldwin <jhb at FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org