HEADSUP: New i386 interrupt and SMP code..

John Baldwin jhb at FreeBSD.org
Thu Oct 30 14:34:53 PST 2003

Coming very soon to a CVS tree near you are some very large changes to
the i386 interrupt and SMP code.  New features include:

- Runtime selection of using the I/O APICs or the AT PICs to route
- I/O APICs can be used in a UP kernel or on a UP system that
  supplies either an MP Table or ACPI APIC Table.
- An SMP kernel can run on a UP machine.  This means that SMP
  can now be enabled in GENERIC and the SMP kernel config can die.
- The ACPI MADT table can be used to enumerate CPUs instead of
  the MP Table if ACPI is enabled.  This will add true HT support
  in that we will finally support the BIOS setting for HT.
- I/O APIC interrupts are now longer forced into 8 IRQs.  Thus,
  when using APICs, each PCI interrupt really gets its own IRQ
  and isn't shared with anyone else.
- Multiple fast interrupt handlers can be attached to a given
  interrupt source provided that all of the handlers are fast.
  (Note: at this point, fast is a poor name, INTR_DIRECT might
  be a better name.)
- Logical APIC IDs are used to route APIC interrupts from the
  I/O APICs to CPUs.  In theory the APIC interrupt code can
  now support 60 CPUs.  The hardware is still limited to 16
- We now correctly route PCI interrupts when using APICs
  using the PCI interrupt routing infrastructure instead of
  a gross hack in pci_cfgregread().  This means that we can
  route interrupts across bridges, support mp tables that
  only list interrupts for chassis devices, etc.  We also
  correctly route PCI interrupts when using APICs and ACPI.
- The new interrupt source abstraction should make it substantially
  easier to add support for MSI interrupts.
- We properly support mixed mode by EOI'ing the AT PIC and
  not EOI'ing the local APIC for mixed mode interrupts (just
  irq 0: clk right now).
- This code can largely be pulled over to amd64 to support
  APICs and SMP on that arch.

Some implementation details include:

- APIC interrupt entry points only use one entry point per 32
  vectors and use the APIC ISR registers to determine which
  interrupt triggered in that range.  This means that the APIC
  code only has to provide 5 entry points instead of 159.
- Because we now support up to 159 different IRQs, the critical
  section optimization code no longer scales well.  Especially
  since the new APIC code does not use a separate entry point
  for each IRQ.  Thus, for the time being at least, critical
  sections have been reverted back to disabling interrupts for
  now.  I do have a WIP for optimizing critical sections using
  a more scalable algorithm should the need arise.
- Each IRQ is actually a cookie tied to an interrupt source.
  Each interrupt source is tied to a PIC driver.  The PIC driver
  supports several operations on each interrupt source including
  disabling the source, enabling it for the first time, etc.
  Each PIC driver is free to store private per-source data with
  each source and private per-pic data with each PIC. 
- APICs (both I/O and local (CPUs)) are enumerated by APIC
  enumerator drivers of which 2 are provided: one to use
  the ACPI MADT table and one to use the MP Table.
- The SMP code no longer knows anything specific to the MP
  table.  Instead, the APIC enumerators inform the SMP code
  of CPUs via a simple cpu_add() interface and the SMP code
  takes it from there.  The SMP code is now much easier to
  read.  Also, all of the APIC code has been split out into
  separate IO and local APIC files aiding in the cleanup.
- Almost all of the interrupt dispatch code now happens in C
  rather than assembly.  Notably, fast interrupt handlers no
  longer have a separate entry point.


- ACPI will no longer work as a module for know.  The reason
  for this is that ACPI's APIC enumerator needs to be able
  to hook into a SI_SUB_TUNABLES - 1 SYSINIT() due to existing
  code that wants to know the available CPUs in the system
  very early (specifically, UMA).  However, code in kernel
  modules cannot be executed until SI_SUB_KLD, which is much
  too late.  This might be able to be addressed later with
  some creative hacking.
- I haven't ported the changes over to PC98 yet.


The code lives in p4 under //depot/user/jhb/acpipci/...
Note that several files have moved around so you might want to
check the 'notes' file and 'setup.sh' file.  If you want to
try it out you can check out the tree using p4 and build a
kernel.  Just be sure to:

 1) Run setup.sh first to create needed symlinks for moved
 2) Use 'device apic' instead of 'options APIC_IO'.

I'm sure there's more details that I've forgotten, but that's
a start at least.


John Baldwin <jhb at FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

More information about the freebsd-arch mailing list