svn commit: r225372 - head/sys/kern

Tue Oct 4 11:06:00 UTC 2011

2011/10/3 Bruce Evans <brde at optusnet.com.au>:
> On Mon, 26 Sep 2011, Attilio Rao wrote:
>
>> 2011/9/4 Bruce Evans <brde at optusnet.com.au>:
>>>
>>> On Sun, 4 Sep 2011, Attilio Rao wrote:
>>>
>>>> Also please notice that intr enable/disable happens in the wrong way
>>>> as it is done via the MD (x86 specific likely) interface. This is
>>>> wrong for 2 reasons:
>>>
>>> No, intr_disable() is MI.  It is also used by witness.  disable_intr()
>>> is the corresponding x86 interface that you may be thinking of.  The MI
>>> interface intr_disable() was introduced to avoid the MD'ness of
>>> intr_disable().
>>
>> I was a bit surprised to verify that you are right but
>> spinlock_enter() has the big difference besides disable_intr() of also
>> explicitly disabling preemption via critical_enter() which some
>> codepath can trigger without even noticing it.
>> This means it is more safer in presence of PREEMPTION option on and
>> thus should be preferred to the normal intr_disable(), in particular
>> for convoluted codepaths.
>
> I think this is another implementation detail which shouldn't be depended
> on.  Spinlocks may or may not need either interrupts disabled or a critical
> section to work.  Now I'm a little surprised to remember that they use a
> critical section.  This is to prevent context switching.  It is useful
> behaviour, but not strictly necessary.
>
> Since disabling interrupts also prevents context switching (excep by buggy
> trap handlers including NMI), it is safe to use hard interrupt disabling
> instead of critical_enter() to prevent context switching.  This is safe
> because code that has interrupts disabled cannot wander off into other
> code that doesn't understand this and does context switching! (unless it
> is broken).  But for preventing context switching, critical_enter() is
> better now that it doesn't hard-disable interrupts internally.

This is not entirely correct, infact you may have preemption even with
interrupts disabled by calling code that schedule threads. This is why
spinlock_enter() disables interrupts _and_ preemption altogether.
Look for example at hardclock() and its internal magic (this is what I
meant, earlier, with "non convoluted codepaths").

>> Can you please explain more about the 'h/w interrupts not disabled' in
>> X86?
>> Are you speaking about NMIs? For those the only way to effectively
>> mask them would be to reprogram the LAPIC entry, but I don't really
>> think we may want that.
>
> This is in my version of x86.  mtx_lock_spin() is entirely in software
> (and deconvoluted -- no macros -- for at least the non-LOCK_DEBUG case):
>
> % void
> % mtx_lock_spin(struct mtx *mp)
> % {
> %       struct thread *td;
> % %     td = curthread;
> %       td->td_critnest++;
>
> The previous line is the entire critical_enter() manually inlined
> (except the full critical_enter() has lots of instrumentation cruft).
> -current has spinlock_enter() here.
>
> -current used to have critical_enter() here (actually in the macro
> correspoding to this) instead.  This depended on critical_enter() being
> pessimal and always doing a hard interrupt disable.  The hard interrupt
> disable is needed as an implementation detail for -current in the !SMP
> case, but for most other uses it is not needed.  The pessimization was
> rediced in -current by moving this hard interrupt disable from
> critical_enter() to the spinlock_enter() cases that need it (currently
> all?).  This optimized critical_enter() to just an increment of
> td_critnest, exactly the same as in my version except for instrumentation
> (mine has lots of messy timing stuff but -current has a single CTR4()).
> My version also optimizes away the hard interrupt disable.
>
> The resulting critical_enter() really should be an inline.  I only did
> this in the manual inlining above.  This handles most cases of interest.
> By un-inlining (un-macroizing) mtx_lock_spin(), but inlining
> critical_enter(), I get the same number of function calls but much smaller
> code since it is the tiny critical_enter() function and not the big
> mtx_lock_spin() one that is inlined.

I'm not entirely sure I follow.

In -CURRENT, right now mtx_lock_spin() just yields
_mtx_lock_spin_flags() which is not inlined.
So the improvement here is just to have inlined critical_enter()? How
this can lead to smaller code?

> %       if (!_obtain_lock(mp, td)) {
> %               if (mp->mtx_lock == (uintptr_t)td)
> %                       (mp)->mtx_recurse++;
> %               else
> %                       _mtx_lock_spin(mp, 0, NULL, 0);
> %       }
>
> Essentially the same as in -current.
>
> % }
>
> The complications are mainly in critical_exit():
> - in my version, when td_critnest is decremented to 0, a MD function is
>  called to "unpend" any pending interrupts that have accumulated while
>  in the critical region.  Since mtx_lock_spin() doesn't hard-disable
>  interrupts, they may occur when a spinlock is held.  Fast interrupts
>  proceed.  Others are blocked until critical_exit() unpends them.
>  This includes software interrupts.  The only really complicated part
>  is letting fast interrupts proceed.  Fast interrupt handlers cannot
>  use or be blocked by any normal locking, since they don't respect
>  normal spinlocks.  So for example, hardclock() cannot be a fast interrupt
>  handler.

I don't think this is a good idea.
We hardly rely on interrupts disabling during spinlock helding in
order to get them be used by fast handlers and then avoid deadlocks
with code running in interrupt/kernel context.

>>>    (The interface here is slightly broken (non-MI).  It returns
>>> register_t.
>>>    This assumes that the interrupt state can be represented in a single
>>>    register.  The type critical_t exists to avoid the same bug in an
>>>    old version of critical_enter().  Now this type is just bogus.
>>>    critical_enter() no longer returns it.  Instead, spinlock_enter() uses
>>>    a non-reentrant interface which stores what used to be the return
>>> value
>>>    of critical_enter() in a per-thread MD data structure (md_saved_pil
>>>    in the above).  Most or all arches use register_t for this.  This
>>>    leaves critical_t as pure garbage -- the only remaining references to
>>>    it are for its definition.)
>>
>> I mostly agree, I think we should have an MD specified type to replace
>> register_t for this (it could alias it, if it is suitable, but this
>> interface smells a lot like x86-centric).
>
> Really vax-centric.  spl "levels" are from vax or earlier CPUs.  x86
> doesn't really have levels (the AT PIC has masks and precedences.  The
> precedences correspond to levels are but rarely depended on or programmed
> specifically).  alpha and sparc seem to have levels much closer to
> vax.
>
> With only levels, even an 8-bit interface for the level is enough (255
> levels should be enough for anyone).  With masks, even a 64-bit interface
> for the mask might not be enough.  When masks were mapped to levels
> for FreeBSD on i386, the largish set of possible mask values was mapped
> into < 8 standard levels (tty, net, bio, etc).  Related encoding of
> MD details as cookies would probably work well enough in general.

For cookie you just mean a void * ptr?

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein