cvs commit: src/sys/kern kern_mutex.c

Bruce Evans brde at optusnet.com.au
Sun Jun 17 07:40:26 UTC 2007


On Fri, 15 Jun 2007, John Baldwin wrote:

> On Friday 15 June 2007 04:46:30 pm Bruce Evans wrote:
>> On Mon, 11 Jun 2007, John Baldwin wrote:

>>> As to why preemption doesn't work for SMP, a thread only knows to preempt
> if
>>> it makes a higher priority thread runnable.  This happens in mtx_unlock
> when
>>> we wakeup a thread waiting on the lock, in wakeup, or when an interrupt
>>> thread is scheduled (the interrupted thread "sees" the ithread being
>>> scheduled).  If another thread on another CPU makes a thread runnable, the
>>> thread on the first CPU has no idea unless the second CPU explicitly sends
> a
>>> message (i.e. IPI) to the first CPU asking it to yield instead.
>>
>> I believe SCHED_ULE does the IPI.
>
> If you add 'options IPI_PREEMPTION' I think the IPI is enabled in 4BSD.

I added this about 6 months ago, but it didn't help then and still doesn't,
at least for SCHED_4BSD and relative to voluntarily yielding in the
PREEMPTION case.

>> Would it be worth checking a preemption flag in mtx_unlock()?  This
>> would bloat the macro a bit.  However, we already have the check and the
>> bloat for spinlocks, at least on i386's, by checking in critical_exit()
>> via spinlock_exit().
>
> All archs check the flag in spinlock_exit().  mtx_unlock() will preempt in
> turnstile_unpend() if we wakeup a higher priority thread, so no need to check
> for a flag.  If we were to get an interrupt while holding the lock we will
> preempt immediately (the patch changes this by deferring that preemption to
> the critical_exit()).
>
>> Using critical sections should have the side effect of getting the flag
>> checked in critical_exit().  This doesn't seem to work (for SCHED_4BSD),
>> so there seems to be a problem setting the flag.
>
> The flag should work fine, but keep in mind the multiple-CPU case I outlined
> above.  That doesn't set the flag unless you have the IPI in place.

>>> Actually, I would keep the sched_runnable(), etc. as #ifndef PREEMPTION,
> the
>>> critical_exit() already does that check (modulo concerns above).  Also, I
>>
>> Testing shows critical_exit() doesn't seem to be doing the preemption.  On
>> UP, depending on PREEMPTION makes little difference, but on 2-way SMP with
>> no voluntary yielding, pagezero is too active.  The critical sections don't
>> seem to be doing much.
>
> You probably need more details (KTR is good) to see exactly when threads are
> becoming runnable (and on which CPUs) and when the kernel is preempting to
> see what it is going on and where the context switches come from.  KTR_SCHED
> + schedgraph.py may prove useful.

Reading the code is easier :-).  I noticed the following problems:
- the flag is neither set nor checked in the !PREEMPTION case, except for
   SCHED_ULE it is set.  Most settings and checkings of it are under the
   PREEMPTION ifdef.  I think this is wrong -- preemption to ithreads should
   occur even without PREEMPTION (so there would be 3 levels of PREEMPTION
   -- none (as given by !PREEMPTION now), preemption to ithreads only (not
   available now?), and whatver FULL_PREEMPTION gives).  The exception is
   that sched_add() for SCHED_ULE calls sched_preempt() in the non-yielding
   case, and setting the flag in sched_preempt() isn't under the PREEMPTION
   ifdef.  But this is moot since SCHED_ULE requires PREEMPTION.
- the condition for being an idle priority thread is wrong.  It is affected
   by much the same translation problems as pri_level in userland.  An idle
   thread may have a borrowed priority, so it is impossible to classify
   idle threads according to theire current priority alone.  This may allow
   setting of the preemption flag to never be done even in the PREEMPTION
   case, as follows:
   = idle priority thread is running with a borrowed non-idle priority
   - and enters a critical section
   - maybe_preempt() is called but of course doesn't preempt because of the
     critical section
   - and also doesn't set the flag due to the misclassified priority.  This
     apparently happens often for pagezero, due to it holding a mutex most
     of the time that it is running.  (Without your proposed change, it
     isn't in a critical section, but I suspect maybe_preempt() doesn't
     preempt it for similar reasons.)
   - when the thread leaves the critical section, preemption doesn't occur
     because the flag is not set, and preemption isn't reconsidered because
     critical_exit() doesn't do that.
   - when the priority is unborrowed, premption should be reconsidered.  I
     don't know if it is.

>>> the reason I wanted to avoid preempting while holding the lock is you can
> get
>>> this case:
>>>
>>> zeroidle -> some ithread -> some top-half thread in kernel which needs the
>>> vm page queue mtx -> zeroidle (with top-half thread's priority; until
>>> mtx_unlock) -> top-half thread in kernel -> zeroidle
>>>
>>> which can be many context switches.  By not switching while holding the
> lock,
>>> one can reduce this to:
>>>
>>> zeroidle -> some ithread -> some top-half thread -> zeroidle
>>>
>>> but at the cost of adding latency to "some ithread" and "some top-half
> thread"
>>
>> Maybe preemption should be inhibited a bit when any mutex is held.
>
> That would make mutexes spinlocks that block interrupts.  Would sort of defeat
> the point of having mutexes that aren't spinlocks.

I mean that preemption should only be inhibited, not completely blocked.
Something like delaying preemption for a couple of microseconds would be
good, but would be hard to implement since the obvious implementation,
of scheduling an interrupt after a couple of microseconds to cause
reconsideration of the preemption decision, would be too expensive.  For
well-behaved threads like pagezero, we can be sure that a suitable
preemption reconsideration point like critical_exit() is called within
a couple of microseconds.  That is essentially how the voluntarily
yielding in pagezero works now.

Bruce


More information about the cvs-src mailing list