sched_userret priority adjustment patch for sched_4bsd
John Baldwin
jhb at FreeBSD.org
Mon Sep 27 11:56:57 PDT 2004
On Monday 27 September 2004 01:20 pm, Stephan Uphoff wrote:
> On Mon, 2004-09-27 at 10:16, John Baldwin wrote:
> > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote:
> > > When a thread is about to return to user space it resets its priority
> > > to the user level priority.
> > > However after lowering the permission its priority it needs to check if
> > > its priority is still better than all other runable threads.
> > > This is currently not implemented.
> > > Without the check the thread can block kernel or user threads with
> > > better priority until a switch is forced by by an interrupt.
> > >
> > > The attached patch checks the relevant runqueues and threads without
> > > slots in the same ksegrp and forces a thread switch if the currently
> > > running thread is no longer the best thread to run after it changed its
> > > priority.
> > >
> > > The patch should improve interactive response under heavy load
> > > somewhat. It needs a lot of testing.
> >
> > Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED
> > based on on a comparison against user_pri rather than td_priority inside
> > of sched_add()? Having the flag set by sched_add() is supposed to make
> > this sort of check unnecessary. Even 4.x has the same bug I think as a
> > process can make another process runnable after it's priority has been
> > boosted by a tsleep() and need_resched() is only called based on a
> > comparison of p_pri. Ah, 4.x doesn't have the bug because it caches the
> > priority of curproc when it enters the kernel and compares against that.
> > Thus, I think the correct fix is more like this:
> >
> > Index: sched_4bsd.c
> > ===================================================================
> > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v
> > retrieving revision 1.63
> > diff -u -r1.63 sched_4bsd.c
> > --- sched_4bsd.c 11 Sep 2004 10:07:22 -0000 1.63
> > +++ sched_4bsd.c 27 Sep 2004 14:12:03 -0000
> > @@ -272,7 +272,7 @@
> > {
> >
> > mtx_assert(&sched_lock, MA_OWNED);
> > - if (td->td_priority < curthread->td_priority)
> > + if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
> > curthread->td_flags |= TDF_NEEDRESCHED;
> > }
> >
> > Index: sched_ule.c
> > ===================================================================
> > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v
> > retrieving revision 1.129
> > diff -u -r1.129 sched_ule.c
> > --- sched_ule.c 11 Sep 2004 10:07:22 -0000 1.129
> > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000
> > @@ -723,7 +723,7 @@
> > */
> > pcpu = pcpu_find(cpu);
> > td = pcpu->pc_curthread;
> > - if (ke->ke_thread->td_priority < td->td_priority ||
> > + if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri ||
> > td == pcpu->pc_idlethread) {
> > td->td_flags |= TDF_NEEDRESCHED;
> > ipi_selected(1 << cpu, IPI_AST);
> >
> > An even better fix might be to fix td_base_pri by having it be set on
> > kernel entry similar to how 4.x sets curpriority. The above fix should
> > be sufficient for now, however.
>
> I don't think that this is enough since TDF_NEEDRESCHED is thread
> specific and not cpu specific.
Hmm, it is CPU specific in 4.x. It could be changed back to being a per-cpu
flag easily.
> However the thread marked with TDF_NEEDRESCHED might not be the next
> thread leaving the kernel.
> ( Can't really talk about ULE since I am trying to avoid looking at
> another shiny irresistible time sink this week ;-)
>
> I think we agree that that td_priority should be set to td_base_pri on
> kernel entry. Since td_base_pri is changed by sleep and condvar
> functions it should also be reset on kernel entry. (Probably from a new
> ksegrp field). Condvar waits should currently non cause the base
> priority to change to the current priority of the thread - otherwise
> td_base_pri could get stuck at a really bad user priority.
> ( td->td_base_pri might end up being worse than
> td->td_ksegrp->kg_user_pri when the ksegrp priority improves)
Well, I think instead that td_base_pri should be set to td_priority on kernel
entry (rather than the other way around). td_priority should be unchanged
just because it enters the kernel. I think the sleep functions could then
leave td_base_pri alone. (I think setting it there is wrong because
td_base_pri is not quite the same as curpriority in 4.x.) What td_base_pri
is really supposed to provide, btw, is the priority that the thread should go
back to once it has unlocked a mutex and had its priorty boosted while it
held the mutex. Arguably it should just be using kg_user_pri for this, but
then you loose priority "boosts" from tsleep(), which is why td_base_pri is
set in msleep(). I guess what should happen is something more like this:
kernel_entry()
{
KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri);
td->td_base_pri = td->td_priority;
}
msleep()
{
sched_prio(...);
td_base_pri = td->td_priority;
}
The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous
e-mail. Also, in sched_prio(), if our priority is ever raised (numerically,
logically less important), we should set TDF_NEEDRESCHED since we may need to
switch (4.x does this in maybe_needresched()). Then, TDF_NEEDRESCHED could
become a per-cpu flag and have it not be cleared in mi_switch() but be
cleared only in userret(). Hmm, I think all of the TDF_NEEDRESCHED handling
actually beings in sched_userret() btw.
--
John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve" = http://www.FreeBSD.org
More information about the freebsd-arch
mailing list