cvs commit: src/sys/kern kern_mutex.c

Mon Jun 18 17:31:59 UTC 2007

On Sunday 17 June 2007 03:40:12 am Bruce Evans wrote:
> On Fri, 15 Jun 2007, John Baldwin wrote:
> 
> > On Friday 15 June 2007 04:46:30 pm Bruce Evans wrote:
> >> On Mon, 11 Jun 2007, John Baldwin wrote:
> 
> >>> As to why preemption doesn't work for SMP, a thread only knows to preempt
> > if
> >>> it makes a higher priority thread runnable.  This happens in mtx_unlock
> > when
> >>> we wakeup a thread waiting on the lock, in wakeup, or when an interrupt
> >>> thread is scheduled (the interrupted thread "sees" the ithread being
> >>> scheduled).  If another thread on another CPU makes a thread runnable, the
> >>> thread on the first CPU has no idea unless the second CPU explicitly sends
> > a
> >>> message (i.e. IPI) to the first CPU asking it to yield instead.
> >>
> >> I believe SCHED_ULE does the IPI.
> >
> > If you add 'options IPI_PREEMPTION' I think the IPI is enabled in 4BSD.
> 
> I added this about 6 months ago, but it didn't help then and still doesn't,
> at least for SCHED_4BSD and relative to voluntarily yielding in the
> PREEMPTION case.

Talk to ups@ about that then, as he did the IPI_PREEMPTION stuff.

> > You probably need more details (KTR is good) to see exactly when threads are
> > becoming runnable (and on which CPUs) and when the kernel is preempting to
> > see what it is going on and where the context switches come from.  KTR_SCHED
> > + schedgraph.py may prove useful.
> 
> Reading the code is easier :-).  I noticed the following problems:
> - the flag is neither set nor checked in the !PREEMPTION case, except for
>    SCHED_ULE it is set.  Most settings and checkings of it are under the
>    PREEMPTION ifdef.  I think this is wrong -- preemption to ithreads should
>    occur even without PREEMPTION (so there would be 3 levels of PREEMPTION
>    -- none (as given by !PREEMPTION now), preemption to ithreads only (not
>    available now?), and whatver FULL_PREEMPTION gives).  The exception is
>    that sched_add() for SCHED_ULE calls sched_preempt() in the non-yielding
>    case, and setting the flag in sched_preempt() isn't under the PREEMPTION
>    ifdef.  But this is moot since SCHED_ULE requires PREEMPTION.

The idea of PREEMPTION is that all preemption is inside the scheduler and
there aren't explicit preemptions like in the mutex code or ithread code as
the kernel can make a decision when a thread is scheduled.  Thus, there is
no ithread preemption w/o PREEMPTION.

> - the condition for being an idle priority thread is wrong.  It is affected
>    by much the same translation problems as pri_level in userland.  An idle
>    thread may have a borrowed priority, so it is impossible to classify
>    idle threads according to theire current priority alone.  This may allow
>    setting of the preemption flag to never be done even in the PREEMPTION
>    case, as follows:
>    = idle priority thread is running with a borrowed non-idle priority
>    - and enters a critical section
>    - maybe_preempt() is called but of course doesn't preempt because of the
>      critical section
>    - and also doesn't set the flag due to the misclassified priority.  This
>      apparently happens often for pagezero, due to it holding a mutex most
>      of the time that it is running.  (Without your proposed change, it
>      isn't in a critical section, but I suspect maybe_preempt() doesn't
>      preempt it for similar reasons.)

Err, it uses the real priority of the thread, so if the page zero thread has
inherited a priority it gets treated as as a non-idle thread until it releases
the mutex and resumes its idle thread priority.  When it releases the lock it
will lower its priority and then preempt to the thread waiting for the lock.
The scheduler will then run through any higher priority tasks before it gets
back to the page zero thread.

>    - when the thread leaves the critical section, preemption doesn't occur
>      because the flag is not set, and preemption isn't reconsidered because
>      critical_exit() doesn't do that.
>    - when the priority is unborrowed, premption should be reconsidered.  I
>      don't know if it is.

See turnstile_unpend() and turnstile_disown().

> >> Maybe preemption should be inhibited a bit when any mutex is held.
> >
> > That would make mutexes spinlocks that block interrupts.  Would sort of defeat
> > the point of having mutexes that aren't spinlocks.
> 
> I mean that preemption should only be inhibited, not completely blocked.
> Something like delaying preemption for a couple of microseconds would be
> good, but would be hard to implement since the obvious implementation,
> of scheduling an interrupt after a couple of microseconds to cause
> reconsideration of the preemption decision, would be too expensive.  For
> well-behaved threads like pagezero, we can be sure that a suitable
> preemption reconsideration point like critical_exit() is called within
> a couple of microseconds.  That is essentially how the voluntarily
> yielding in pagezero works now.

Hmm, that could be beneficial.  Similar to how adaptive spinning optimizes
for short holds.

-- 
John Baldwin