Getting rid of the static msleep priority boost

John Baldwin jhb at freebsd.org
Fri Mar 7 15:36:09 UTC 2008


On Friday 07 March 2008 08:42:37 am John Baldwin wrote:
> On Friday 07 March 2008 07:16:30 am Jeff Roberson wrote:
> > Hello,
> >
> > I've been studying some problems with recent scheduler improvements that
> > help a lot on some workloads and hurt on others.  I've tracked the
> > problem down to static priority boosts handed out by
> > msleep/cv_broadcastpri.  The basic problem is that a user thread will be
> > woken with a kernel priority thus allowing it to preempt a thread running
> > on any processor with a lesser priority.  The lesser priority thread may
> > in fact hold some resource that the higher priority thread requires. 
> > Thus we context switch several times and perhaps go through priority
> > propagation as well.
> >
> > I have verified that disabling these static priority boosts entirely
> > fixes the performance problem I've run into on at least one workload. 
> > There are probably others that it helps and hopefully we can discover
> > that.
> >
> > I'd like to know if anyone has a strong preference to keep this feature.
> > It is likely that it helps in some interactive situations.  I'm not sure
> > how much however.  I propose that we make a sysctl that disables it and
> > turn it off by default.  If we see complaints on current@ we can suggest
> > that they toggle the sysctl to see if it alleviates problems.
> >
> > Based on feedback from that experiment and some testing we can then
> > choose a few options:
> >
> > 1)  Disable the static boosts entirely.  Leave kernel priorities for
> > kernel threads and priority propagation.  Most other kernels do this.
> > Would make my life in ULE much easier as well.
> >
> > 2)  Leave the support for static boosts but remove it from all but a few
> > key locations.  Leaving it in the api would give some flexibility but
> > might confuse developers.
> >
> > 3)  Leave things as they are.  undesirable.
> >
> > I'm leaning towards #2 based on the information I have presently.  This
> > is almost a significant change to historic BSD behavior so we might want
> > to tread lightly.
>
> One thing to note is that we actually depend on the priority boost (evilly)
> to pick processes to swap out.  (I think we check for <= PSOCK and don't
> swap those out).  One thing that I've wanted to happen for a while is that
> the sleep priority for msleep() just be a parameter available to the
> scheduler that the scheduler can use to calculate the real internal
> priority rather than just being a set.  That is, I imagine having:
>
> void	sched_set_sleep_prio(struct thread *td, u_char pri);
> u_char	sched_get_sleep_prio(struct thread *td);
>
> (The swap check would use the get call).  The 4BSD scheduler's
> implementation of sched_set_sleep_prio would look like this:
>
> void
> sched_set_sleep_prio(struct thread *td, u_char pri)
> {
>
> 	td->td_sched->sleep_pri = pri;
> 	sched_prio(td, pri);
> }
>
> void
> sched_userret(..)
> {
>
> 	...
> 	td->td_sched->sleep_pri = 0;	/* not in the kernel anymore */
> }
>
> but other schedulers may just save it and recalculate the priority where
> the priority calculation just considers the sleep priority as one among
> many factors.  If nothing else, this allows it to be a scheduler decision
> to ignore it (so 4BSD could continue to do what it does now, but ULE may
> ignore it, or ignore certain levels, etc.)

One thing to clarify: I'm not opposed to replacing the PSOCK check with 
something more suitable in the swap code, (in fact, that would be desirable), 
but it might take a good bit of work to do that and is probably easier to 
work on that as a separate change.  I also think there can be some merit in 
having code paths hint to the scheduler the relative interactivity/priority 
of a sleep.

-- 
John Baldwin


More information about the freebsd-arch mailing list