sched_userret priority adjustment patch for sched_4bsd

John Baldwin jhb at FreeBSD.org
Mon Sep 27 11:56:57 PDT 2004


On Monday 27 September 2004 01:20 pm, Stephan Uphoff wrote:
> On Mon, 2004-09-27 at 10:16, John Baldwin wrote:
> > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote:
> > > When a thread is about to return to user space it resets its priority
> > > to the user level priority.
> > > However after lowering the permission its priority it needs to check if
> > > its priority is still better than all other runable threads.
> > > This is currently not implemented.
> > > Without the check the thread can block kernel or user threads with
> > > better priority until a switch is forced by by an interrupt.
> > >
> > > The attached patch checks the relevant runqueues and threads without
> > > slots in the same ksegrp and forces a thread switch if the currently
> > > running thread is no longer the best thread to run after it changed its
> > > priority.
> > >
> > > The patch should improve interactive response under heavy load
> > > somewhat. It needs a lot of testing.
> >
> > Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED
> > based on on a comparison against user_pri rather than td_priority inside
> > of sched_add()?  Having the flag set by sched_add() is supposed to make
> > this sort of check unnecessary.  Even 4.x has the same bug I think as a
> > process can make another process runnable after it's priority has been
> > boosted by a tsleep() and need_resched() is only called based on a
> > comparison of p_pri. Ah, 4.x doesn't have the bug because it caches the
> > priority of curproc when it enters the kernel and compares against that. 
> > Thus, I think the correct fix is more like this:
> >
> > Index: sched_4bsd.c
> > ===================================================================
> > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v
> > retrieving revision 1.63
> > diff -u -r1.63 sched_4bsd.c
> > --- sched_4bsd.c        11 Sep 2004 10:07:22 -0000      1.63
> > +++ sched_4bsd.c        27 Sep 2004 14:12:03 -0000
> > @@ -272,7 +272,7 @@
> >  {
> >
> >         mtx_assert(&sched_lock, MA_OWNED);
> > -       if (td->td_priority < curthread->td_priority)
> > +       if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
> >                 curthread->td_flags |= TDF_NEEDRESCHED;
> >  }
> >
> > Index: sched_ule.c
> > ===================================================================
> > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v
> > retrieving revision 1.129
> > diff -u -r1.129 sched_ule.c
> > --- sched_ule.c 11 Sep 2004 10:07:22 -0000      1.129
> > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000
> > @@ -723,7 +723,7 @@
> >          */
> >         pcpu = pcpu_find(cpu);
> >         td = pcpu->pc_curthread;
> > -       if (ke->ke_thread->td_priority < td->td_priority ||
> > +       if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri ||
> >             td == pcpu->pc_idlethread) {
> >                 td->td_flags |= TDF_NEEDRESCHED;
> >                 ipi_selected(1 << cpu, IPI_AST);
> >
> > An even better fix might be to fix td_base_pri by having it be set on
> > kernel entry similar to how 4.x sets curpriority.  The above fix should
> > be sufficient for now, however.
>
> I don't think that this is enough since TDF_NEEDRESCHED is thread
> specific and not cpu specific.

Hmm, it is CPU specific in 4.x.  It could be changed back to being a per-cpu 
flag easily.

> However the thread marked with TDF_NEEDRESCHED might not be the next
> thread leaving the kernel.
> ( Can't really talk about ULE since I am trying to avoid looking at
> another shiny irresistible time sink this week ;-)
>
> I think we agree that that td_priority should be set to td_base_pri on
> kernel entry. Since td_base_pri is changed by sleep and condvar
> functions it should also be reset on kernel entry. (Probably from a new
> ksegrp field). Condvar waits should currently non cause the base
> priority to change to the current priority of the thread - otherwise
> td_base_pri could get stuck at a really bad user priority.
> ( td->td_base_pri might end up being worse than
> td->td_ksegrp->kg_user_pri when the ksegrp priority improves)

Well, I think instead that td_base_pri should be set to td_priority on kernel 
entry (rather than the other way around).  td_priority should be unchanged 
just because it enters the kernel.  I think the sleep functions could then 
leave td_base_pri alone.  (I think setting it there is wrong because 
td_base_pri is not quite the same as curpriority in 4.x.)  What td_base_pri 
is really supposed to provide, btw, is the priority that the thread should go 
back to once it has unlocked a mutex and had its priorty boosted while it 
held the mutex.  Arguably it should just be using kg_user_pri for this, but 
then you loose priority "boosts" from tsleep(), which is why td_base_pri is 
set in msleep().  I guess what should happen is something more like this:

kernel_entry()
{
	KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri);
	td->td_base_pri = td->td_priority;
}

msleep()
{
	sched_prio(...);
	td_base_pri = td->td_priority;
}

The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous 
e-mail.  Also, in sched_prio(), if our priority is ever raised (numerically, 
logically less important), we should set TDF_NEEDRESCHED since we may need to 
switch (4.x does this in maybe_needresched()).  Then, TDF_NEEDRESCHED could 
become a per-cpu flag and have it not be cleared in mi_switch() but be 
cleared only in userret().  Hmm, I think all of the TDF_NEEDRESCHED handling 
actually beings in sched_userret() btw.

-- 
John Baldwin <jhb at FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org


More information about the freebsd-arch mailing list