Starting APs earlier during boot

Fri Mar 18 19:02:45 UTC 2016

On Tuesday, February 16, 2016 12:50:22 PM John Baldwin wrote:
> Currently the kernel bootstraps the non-boot processors fairly early in the
> SI_SUB_CPU SYSINIT.  The APs then spin waiting to be "released".  We currently
> release the APs as one of the last steps at SI_SUB_SMP.  On the one hand this
> removes much of the need for synchronization while SYSINITs are running since
> SYSINITs basically assume they are single-threaded.  However, it also enforces
> some odd quirks.  Several places that deal with per-CPU resources have to
> split initialization up so that the BSP init happens in one SYSINIT and the
> initialization of the APs happens in a second SYSINIT at SI_SUB_SMP.
> 
> Another issue that is becoming more prominent on x86 (and probably will also
> affect other platforms if it isn't already) is that to support working
> interrupts for interrupt config hooks we bind all interrupts to the BSP during
> boot and only distribute them among other CPUs near the end at SI_SUB_SMP. 
> This is especially problematic with drivers for modern hardware allocating
> num(CPUs) interrupts (hoping to use one per CPU).  On x86 we have aboug 190
> IDT vectors available for device interrupts, so in theory we should be able to
> tolerate a lot of drivers doing this (e.g. 60 drivers could allocate 3
> interrupts for every CPU and we should still be fine).  However, if you have,
> say, 32 cores in a system, then you can only handle about 5 drivers doing
> this before you run out of vectors on CPU 0.
> 
> Longer term we would also like to eventually have most drivers attach in the 
> same environment during boot as during post-boot.  Right now post-boot is 
> quite different as all CPUs are running, interrupts work, etc.  One of the 
> goals of multipass support for new-bus is to help us get there by probing 
> enough hardware to get timers working and starting the scheduler before 
> probing the rest of the devices.  That goal isn't quite realized yet.
> 
> However, we can run a slightly simpler version of our scheduler before
> timers are working.  In fact, sleep/wakeup work just fine fairly early (we
> allocate the necessary structures at SI_SUB_KMEM which is before the APs
> are even started).  Once idle threads are created and ready we could in
> theory let the APs startup and run other threads.  You just don't have working 
> timeouts.  OTOH, you can sort of simulate timeouts if you modify the scheduler 
> to yield the CPU instead of blocking the thread for a sleep with a timeout.  
> The effect would be for threads that do sleeps with a timeout to fall back to 
> polling before timers are working.  In practice, all of the early kernel 
> threads use sleeps without timeouts when idle so this doesn't really matter.

After some more testing, I've simplified the early scheduler a bit.  It no
longer tries to simulate timeouts by just keeping the thread runnable.  Instead,
a sleep with a timeout just panics.  However, it does still permit sleeps with
infinite sleeps.  Some code that uses a timeout really wants a timeout (note
that pause() has a hack to fallback to DELAY() internally if cold is true for
this reason).  Instead, my feeling is that any kthreads that need timeouts to
work need to defer their startup until SI_SUB_KICK_SCHEDULER.

> However, I'd like feedback on the general idea and if it is acceptable I'd
> like to coordinate testing with other platforms so this can go into the
> tree.

I don't think I've seen any objections?  This does need more testing.  I will
update the patch to add a new EARLY_AP_STARTUP kernel option so this can be
committed (but not yet enabled) allowing for easier testing (and allowing
other platforms to catch up to x86).

> The current changes are in the 'ap_startup' branch at github/bsdjhb/freebsd.
> You can view them here:
> 
> https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:ap_startup

-- 
John Baldwin