svn commit: r225372 - head/sys/kern

Mon Oct 3 19:11:23 UTC 2011

On Mon, 26 Sep 2011, Attilio Rao wrote:

> 2011/9/4 Bruce Evans <brde at optusnet.com.au>:
>> On Sun, 4 Sep 2011, Attilio Rao wrote:
>>
>>> Also please notice that intr enable/disable happens in the wrong way
>>> as it is done via the MD (x86 specific likely) interface. This is
>>> wrong for 2 reasons:
>>
>> No, intr_disable() is MI. Â It is also used by witness. Â disable_intr()
>> is the corresponding x86 interface that you may be thinking of. Â The MI
>> interface intr_disable() was introduced to avoid the MD'ness of
>> intr_disable().
>
> I was a bit surprised to verify that you are right but
> spinlock_enter() has the big difference besides disable_intr() of also
> explicitly disabling preemption via critical_enter() which some
> codepath can trigger without even noticing it.
> This means it is more safer in presence of PREEMPTION option on and
> thus should be preferred to the normal intr_disable(), in particular
> for convoluted codepaths.

I think this is another implementation detail which shouldn't be depended
on.  Spinlocks may or may not need either interrupts disabled or a critical
section to work.  Now I'm a little surprised to remember that they use a
critical section.  This is to prevent context switching.  It is useful
behaviour, but not strictly necessary.

Since disabling interrupts also prevents context switching (excep by buggy
trap handlers including NMI), it is safe to use hard interrupt disabling
instead of critical_enter() to prevent context switching.  This is safe
because code that has interrupts disabled cannot wander off into other
code that doesn't understand this and does context switching! (unless it
is broken).  But for preventing context switching, critical_enter() is
better now that it doesn't hard-disable interrupts internally.

>>> 1) There may be some codepaths leading to explicit preemption
>>> 2) It should Â really use an MI interface
>>>
>>> The right way to do this should be via spinlock_enter().
>>
>> spinlock_enter() is MI, but has wrong semantics. Â In my version of i386,
>> spinlocks don't disable any h/w interrupt, as is needed for fast interrupt
>> handlers to actually work. Â I believe sparc64 is similar, except its
>> spinlock_enter() disables most h/w interrupts and this includes fast
>> interrupt handlers. Â I don't understand sparc64, but it looks like its
>> spinlock_enter() disables all interrupts visible in C code, but not
>> all interrupts:
>
> Can you please explain more about the 'h/w interrupts not disabled' in X86?
> Are you speaking about NMIs? For those the only way to effectively
> mask them would be to reprogram the LAPIC entry, but I don't really
> think we may want that.

This is in my version of x86.  mtx_lock_spin() is entirely in software
(and deconvoluted -- no macros -- for at least the non-LOCK_DEBUG case):

% void
% mtx_lock_spin(struct mtx *mp)
% {
% 	struct thread *td;
% 
% 	td = curthread;
% 	td->td_critnest++;

The previous line is the entire critical_enter() manually inlined
(except the full critical_enter() has lots of instrumentation cruft).
-current has spinlock_enter() here.

-current used to have critical_enter() here (actually in the macro
correspoding to this) instead.  This depended on critical_enter() being
pessimal and always doing a hard interrupt disable.  The hard interrupt
disable is needed as an implementation detail for -current in the !SMP
case, but for most other uses it is not needed.  The pessimization was
rediced in -current by moving this hard interrupt disable from
critical_enter() to the spinlock_enter() cases that need it (currently
all?).  This optimized critical_enter() to just an increment of
td_critnest, exactly the same as in my version except for instrumentation
(mine has lots of messy timing stuff but -current has a single CTR4()).
My version also optimizes away the hard interrupt disable.

The resulting critical_enter() really should be an inline.  I only did
this in the manual inlining above.  This handles most cases of interest.
By un-inlining (un-macroizing) mtx_lock_spin(), but inlining
critical_enter(), I get the same number of function calls but much smaller
code since it is the tiny critical_enter() function and not the big
mtx_lock_spin() one that is inlined.

% 	if (!_obtain_lock(mp, td)) {
% 		if (mp->mtx_lock == (uintptr_t)td)
% 			(mp)->mtx_recurse++;
% 		else
% 			_mtx_lock_spin(mp, 0, NULL, 0);
% 	}

Essentially the same as in -current.

% }

The complications are mainly in critical_exit():
- in my version, when td_critnest is decremented to 0, a MD function is
   called to "unpend" any pending interrupts that have accumulated while
   in the critical region.  Since mtx_lock_spin() doesn't hard-disable
   interrupts, they may occur when a spinlock is held.  Fast interrupts
   proceed.  Others are blocked until critical_exit() unpends them.
   This includes software interrupts.  The only really complicated part
   is letting fast interrupts proceed.  Fast interrupt handlers cannot
   use or be blocked by any normal locking, since they don't respect
   normal spinlocks.  So for example, hardclock() cannot be a fast interrupt
   handler.
- in -current, when td_critnest is incremented to 0, it calls mi_switch()
   to implement delayed preemption if a seting of td_owepreempt has
   accumulated while in the critical region.  It is much simpler because
   it only has this one type of delayed "interrupt" to handle.

>> from cpufunc.h:
>> % static __inline register_t
>> % intr_disable(void)
>> % {
>> % Â  Â  Â  register_t s;
>> % % Â  Â  s = rdpr(pstate);
>> % Â  Â  Â  wrpr(pstate, s & ~PSTATE_IE, 0);
>> % Â  Â  Â  return (s);
>> % }
>>
>> This seems to mask all interrupts, as required.
>>
>> Â  Â (The interface here is slightly broken (non-MI). Â It returns register_t.
>> Â  Â This assumes that the interrupt state can be represented in a single
>> Â  Â register. Â The type critical_t exists to avoid the same bug in an
>> Â  Â old version of critical_enter(). Â Now this type is just bogus.
>> Â  Â critical_enter() no longer returns it. Â Instead, spinlock_enter() uses
>> Â  Â a non-reentrant interface which stores what used to be the return value
>> Â  Â of critical_enter() in a per-thread MD data structure (md_saved_pil
>> Â  Â in the above). Â Most or all arches use register_t for this. Â This
>> Â  Â leaves critical_t as pure garbage -- the only remaining references to
>> Â  Â it are for its definition.)
>
> I mostly agree, I think we should have an MD specified type to replace
> register_t for this (it could alias it, if it is suitable, but this
> interface smells a lot like x86-centric).

Really vax-centric.  spl "levels" are from vax or earlier CPUs.  x86
doesn't really have levels (the AT PIC has masks and precedences.  The
precedences correspond to levels are but rarely depended on or programmed
specifically).  alpha and sparc seem to have levels much closer to
vax.

With only levels, even an 8-bit interface for the level is enough (255
levels should be enough for anyone).  With masks, even a 64-bit interface
for the mask might not be enough.  When masks were mapped to levels
for FreeBSD on i386, the largish set of possible mask values was mapped
into < 8 standard levels (tty, net, bio, etc).  Related encoding of
MD details as cookies would probably work well enough in general.

Bruce