Realtime thread priorities

Tue Dec 14 04:02:08 UTC 2010

John Baldwin wrote:
> 
> On Sunday, December 12, 2010 3:06:20 pm Sergey Babkin wrote:
> > John Baldwin wrote:
> > >
> > > The current layout breaks up the global thread priority space (0 - 255)
> into a
> > > couple of bands:
> > >
> > >   0 -  63 : interrupt threads
> > >  64 - 127 : kernel sleep priorities (PSOCK, etc.)
> > > 128 - 159 : real-time user threads (rtprio)
> > > 160 - 223 : time-sharing user threads
> > > 224 - 255 : idle threads (idprio and kernel idle procs)
> > >
> > > If we decide to change the behavior I see two possible fixes:
> > >
> > > 1) (easy) just move the real-time priority range above the kernel sleep
> > > priority range
> >
> > Would not this cause a priority inversion when an RT process
> > enters the kernel mode?
> 
> How so?  Note that timesharing threads are not "bumped" to a kernel sleep
> priority when they enter the kernel either.  The kernel sleep priorities are
> purely a way for certain sleep channels to cause a thread to be treated as
> interactive and give it a priority boost to favor interactive threads.
> Threads in the kernel do not automatically have higher priority than threads
> not in the kernel.  Keep in mind that all stopped threads (threads not
> executing) are always in the kernel when they stop.

I may be a bit behind the times here. But historically the "default"
process priority means the priority when the process was pre-empted.
If it did a system call, the priority on wake up would be as
specified in the sleep() kernel function (or its more modern
analog, like a sleeplock or condition variable). This would 
let the kernel code react quickly, and then on return from 
the syscall revert to the original priority, and possibly 
get pre-empted by another process at that time. 

If the user-mode priority is higher than the kernel-mode priority,
this would mean that once a high priority process does a system
call (say for example, poll()), it would experience a priority
inversion and sleep with a lower priority than specified.

A fix for this should be fairly straightforward. The process structure
has the RT priority in it, so all that sleep() need is to check 
it and use that priority if it's higher than the one given
as an argument. Or optionally, for the RT processes bump the argument
by how much the process'es RT priority is over the "RT baseline".
(Well, "logically over", numerically under).

This would not solve the more general classic priority inversion
issue with some low-priority process grabbing some kernel 
resource and sleeping at a lower priority while then an RT process
waits for this resource to be freed. I think the original
idea of in-kernel processes having the higher priorities is in
part an attempt to answer this problem. But I agree with you that
letting the RT processes have a higher priority than the TS processes
in the kernel mode is better than nothing.

Um, a stupid question: does the signal() primitive on mutexes/condvars
(i.e. "wake up one sleeper") pick the thread with the highest 
priority? I guess it should, since otherwise the classic 
priority inversion can get pretty bad. But you can tell from
what I say, for how long I haven't looked at that code, so I 
don't know the answer.

-SB