svn commit: r213985 - head/sys/sparc64/sparc64

Tue Oct 19 02:52:39 UTC 2010

On Tue, 19 Oct 2010, Alexander Motin wrote:

> Marius Strobl wrote:
>> On Mon, Oct 18, 2010 at 05:05:24PM -0400, John Baldwin wrote:
>>> On Monday, October 18, 2010 4:52:24 pm Marius Strobl wrote:
>>>> AFAICT this is not true; intr_event_handle() in sys/kern/kern_intr.c
>>>> is what enters a critical section and f.e. on amd64 I don't see where
>>>> anywhere in the path from ISR_VEC() to intr_execute_handlers()
>>>> calling intr_event_handle() a critical section would be entered,
>>>> which also means that in intr_execute_handlers() td_intr_nesting_level
>>>> is incremented outside of a critical section.
>>> Not all of the clock interrupts use intr_event_handle().  The local APIC
>>> timer uses its own interrupt entry point on x86 for example and uses an
>>> explicit critical section as a result.  I suspect the sparc64 tick interrupt
>>> is closer to the local APIC timer case and doesn't use intr_event_handle().
>>
>> Correct; but still you can't say that the MD interrupt code enters a
>> critical section in general, neither is incrementing td_intr_nesting_level
>> in intr_execute_handlers() protected by a critical section.
>>
>>> The fact that some clock interrupts do use intr_event_handle() (e.g. the
>>> atrtc driver on x86 now) does indicate that the low-level interrupt code
>>> probably does not belong in the time events code but in the caller.
>>
>> Well, I agree that entering a critical section in the time events
>> code would mean entering a nested critical section unnecessarily in
>> case the clock driver uses a regular "fast" interrupt handler and
>> that should be avoided. Still I don't think the event time front-end
>> actually should need to worry about wrapping the callback in a
>> critical section.

I think it belongs in the caller, and the front end shouldn't do any more
than assert it.

> Interrupt frame, required for hard-/stat-/profclock() operation is
> stored in curthread. So critical section is effectively mandatory there
> now. Correct td_intr_nesting_level value is also important for proper
> interrupt threads scheduling - one more reason to have critical section
> there. It is indeed strange that td_intr_nesting_level in
> intr_event_handle() is not covered by critical section, but probably it
> should.

td_intr_nesting_level is garbage, as is the style of the use of it in
sched_ule.c (non-boolean used in boolean context).  I introduced it
(as a global named intr_nesting_level) to count interrupt nesting with
spls.  It was an i386 implemention detail for limiting interrupt nesting
to about 4, but a few places abused it to detect things like M_WAITOK
malloc()s in interrupt context.  There was 1 non-abusive MI reference
to it, via the MD CLKF_INTR(framep) macro, and the most of the abuses
wouldn't have been abuses if there had been a MD macro for them.  See
RELENG_4 for all this.  (spls allow interrupt nesting to either the
number of bits in the mask or the number of high-level spls, but the
kernel stack is not large enough for this, so there was an even more
arbitary limit of about 4.)

Now with ithreads for interrupts, interrupts can't nest (an interrupt
can interrupt an ithread, but that is no more nesting than an interrupt
interrupting a user thread).  td_intr_nesting_level is 0 in ithreads,
so it is almost useless for its original abuse for detecting things
like M_WAITOK malloc()s in interrupt context, but it is still abused
for that :-(.  td_intr_nesting level is != 0 mainly in low-level interrupt
handling code and fast interrupt handlers, and neither of these should
be so broken as to call malloc(), so the check in malloc() is useless;
the correct check (for TDP_ITHREAD) is missing.  Similar checks in
uipc_mbuf.c were just removed, and similar but more-bogus checks in
vinum went away with vinum.

This leaves 3 possibly-useful uses of td_intr_nesting_level in -current:

- {amd64,i386}/trap.c: td_intr_nesting_level  is checked to be zero to
   makes certain traps more fatal when it is nonzero.  Other arches
   don't bother with such a test.  amd64 and i386 don't bother with
   testing the more likely error of a trap that should be fatal in an
   ithread.  This can be handled in the same way as in malloc() -- don't
   bother testing for the very unlikely case, especially if you don't
   bother testing for the unlikely case.

- kern_clock.c: this tests for td_intr_nesting_level >= 2.  That is
   almost exactly the test that used to be hidden in the MD CLKF_INTR().
   I used to maintain patches with an assertion that this never happens
   with ithreads.  It never happens unless there is a bug in the low-level
   interrupt handling code or in the management of td_intr_nesting_level.
   On entry to an interrupt, td_intr_nesting level is incremented to
   1, and another thread should be switched to without allowing any
   interrupt that will increment it again; then on switch back, interrupts
   should remain masked so that it is not incremented from 1 to 2 just
   before it is decremented to 0.  I would have removed td_intr_nesting_level
   from my version, except my version actually needs it, for this use
   in hardclock() only :-(.  In my version, the low-level interrupt
   masking is partially in software, so clock interrupts can interrupt
   the low-level interrupt handling.  I just remembered that sparce64
   also has or had some interrupt masking in software; perhaps it needs
   a similar treatment.

- sched_ule.c: releatively recently, this scheduler started using
   td_intr_nesting_level.  More recently td_intr_nesting_level was
   incremented in clock code to keep this scheduler happy, so something
   is clearly needed.  But sched_4bsd.c doesn't need it, and the
   increment being outside of the critical section in
   intr_execute_handlers() shows how uncritical this variable is.
   Perhaps sched_ule.c can use td_critnest instead.  You have to be
   carefully with not calling critical_enter() too much, so that
   the scheduler cannot tell the real level.  Levels near mi_switch()
   have to have levels of precisely 0, 1 or 2, so that the level
   drops to precisely 0 when the switch is complete, and this is related
   to deciding whether to schedule.

and 2 other nonsense abuses:
- dev/sound/isa/sb16.c: abuses td_intr_nesting_level to avoid doing a
   diagnostic interrupt if (td_intr_nesting_level != 0).  I think this
   has no effect, as for malloc() except it is easier to inspect the
   code to see that it cannot be reached in fast interrupt handler
   context, so the check is null.
- vm_page.c: exactly the same as in malloc(), with a check for unlikely
   fast interrupt handler context but none for ordinary ithread context.

Most arch-specific code that manages td_intr_nesting_level does it in a
critical section.  powerpc also uses atomic ops.

I only grepped for td_intr_nesting_level, and would have missed any
different spelling of it in asm code where most of the original i386
code for it was.

Bruce