Getting rid of the static msleep priority boost

John Baldwin jhb at freebsd.org
Fri Mar 7 14:36:09 UTC 2008


On Friday 07 March 2008 07:16:30 am Jeff Roberson wrote:
> Hello,
>
> I've been studying some problems with recent scheduler improvements that
> help a lot on some workloads and hurt on others.  I've tracked the problem
> down to static priority boosts handed out by msleep/cv_broadcastpri.  The
> basic problem is that a user thread will be woken with a kernel priority
> thus allowing it to preempt a thread running on any processor with a
> lesser priority.  The lesser priority thread may in fact hold some
> resource that the higher priority thread requires.  Thus we context switch
> several times and perhaps go through priority propagation as well.
>
> I have verified that disabling these static priority boosts entirely fixes
> the performance problem I've run into on at least one workload.  There are
> probably others that it helps and hopefully we can discover that.
>
> I'd like to know if anyone has a strong preference to keep this feature.
> It is likely that it helps in some interactive situations.  I'm not sure
> how much however.  I propose that we make a sysctl that disables it and
> turn it off by default.  If we see complaints on current@ we can suggest
> that they toggle the sysctl to see if it alleviates problems.
>
> Based on feedback from that experiment and some testing we can then choose
> a few options:
>
> 1)  Disable the static boosts entirely.  Leave kernel priorities for
> kernel threads and priority propagation.  Most other kernels do this.
> Would make my life in ULE much easier as well.
>
> 2)  Leave the support for static boosts but remove it from all but a few
> key locations.  Leaving it in the api would give some flexibility but
> might confuse developers.
>
> 3)  Leave things as they are.  undesirable.
>
> I'm leaning towards #2 based on the information I have presently.  This is
> almost a significant change to historic BSD behavior so we might want to
> tread lightly.

One thing to note is that we actually depend on the priority boost (evilly) to 
pick processes to swap out.  (I think we check for <= PSOCK and don't swap 
those out).  One thing that I've wanted to happen for a while is that the 
sleep priority for msleep() just be a parameter available to the scheduler 
that the scheduler can use to calculate the real internal priority rather 
than just being a set.  That is, I imagine having:

void	sched_set_sleep_prio(struct thread *td, u_char pri);
u_char	sched_get_sleep_prio(struct thread *td);

(The swap check would use the get call).  The 4BSD scheduler's implementation 
of sched_set_sleep_prio would look like this:

void
sched_set_sleep_prio(struct thread *td, u_char pri)
{

	td->td_sched->sleep_pri = pri;
	sched_prio(td, pri);
}

void
sched_userret(..)
{

	...
	td->td_sched->sleep_pri = 0;	/* not in the kernel anymore */
}

but other schedulers may just save it and recalculate the priority where the 
priority calculation just considers the sleep priority as one among many 
factors.  If nothing else, this allows it to be a scheduler decision to 
ignore it (so 4BSD could continue to do what it does now, but ULE may ignore 
it, or ignore certain levels, etc.)

-- 
John Baldwin


More information about the freebsd-arch mailing list