HEADSUP: New i386 interrupt and SMP code..
jhb at FreeBSD.org
Thu Oct 30 14:34:53 PST 2003
Coming very soon to a CVS tree near you are some very large changes to
the i386 interrupt and SMP code. New features include:
- Runtime selection of using the I/O APICs or the AT PICs to route
- I/O APICs can be used in a UP kernel or on a UP system that
supplies either an MP Table or ACPI APIC Table.
- An SMP kernel can run on a UP machine. This means that SMP
can now be enabled in GENERIC and the SMP kernel config can die.
- The ACPI MADT table can be used to enumerate CPUs instead of
the MP Table if ACPI is enabled. This will add true HT support
in that we will finally support the BIOS setting for HT.
- I/O APIC interrupts are now longer forced into 8 IRQs. Thus,
when using APICs, each PCI interrupt really gets its own IRQ
and isn't shared with anyone else.
- Multiple fast interrupt handlers can be attached to a given
interrupt source provided that all of the handlers are fast.
(Note: at this point, fast is a poor name, INTR_DIRECT might
be a better name.)
- Logical APIC IDs are used to route APIC interrupts from the
I/O APICs to CPUs. In theory the APIC interrupt code can
now support 60 CPUs. The hardware is still limited to 16
- We now correctly route PCI interrupts when using APICs
using the PCI interrupt routing infrastructure instead of
a gross hack in pci_cfgregread(). This means that we can
route interrupts across bridges, support mp tables that
only list interrupts for chassis devices, etc. We also
correctly route PCI interrupts when using APICs and ACPI.
- The new interrupt source abstraction should make it substantially
easier to add support for MSI interrupts.
- We properly support mixed mode by EOI'ing the AT PIC and
not EOI'ing the local APIC for mixed mode interrupts (just
irq 0: clk right now).
- This code can largely be pulled over to amd64 to support
APICs and SMP on that arch.
Some implementation details include:
- APIC interrupt entry points only use one entry point per 32
vectors and use the APIC ISR registers to determine which
interrupt triggered in that range. This means that the APIC
code only has to provide 5 entry points instead of 159.
- Because we now support up to 159 different IRQs, the critical
section optimization code no longer scales well. Especially
since the new APIC code does not use a separate entry point
for each IRQ. Thus, for the time being at least, critical
sections have been reverted back to disabling interrupts for
now. I do have a WIP for optimizing critical sections using
a more scalable algorithm should the need arise.
- Each IRQ is actually a cookie tied to an interrupt source.
Each interrupt source is tied to a PIC driver. The PIC driver
supports several operations on each interrupt source including
disabling the source, enabling it for the first time, etc.
Each PIC driver is free to store private per-source data with
each source and private per-pic data with each PIC.
- APICs (both I/O and local (CPUs)) are enumerated by APIC
enumerator drivers of which 2 are provided: one to use
the ACPI MADT table and one to use the MP Table.
- The SMP code no longer knows anything specific to the MP
table. Instead, the APIC enumerators inform the SMP code
of CPUs via a simple cpu_add() interface and the SMP code
takes it from there. The SMP code is now much easier to
read. Also, all of the APIC code has been split out into
separate IO and local APIC files aiding in the cleanup.
- Almost all of the interrupt dispatch code now happens in C
rather than assembly. Notably, fast interrupt handlers no
longer have a separate entry point.
- ACPI will no longer work as a module for know. The reason
for this is that ACPI's APIC enumerator needs to be able
to hook into a SI_SUB_TUNABLES - 1 SYSINIT() due to existing
code that wants to know the available CPUs in the system
very early (specifically, UMA). However, code in kernel
modules cannot be executed until SI_SUB_KLD, which is much
too late. This might be able to be addressed later with
some creative hacking.
- I haven't ported the changes over to PC98 yet.
The code lives in p4 under //depot/user/jhb/acpipci/...
Note that several files have moved around so you might want to
check the 'notes' file and 'setup.sh' file. If you want to
try it out you can check out the tree using p4 and build a
kernel. Just be sure to:
1) Run setup.sh first to create needed symlinks for moved
2) Use 'device apic' instead of 'options APIC_IO'.
I'm sure there's more details that I've forgotten, but that's
a start at least.
John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/
More information about the freebsd-arch