Realtime thread priorities

Fri Dec 10 17:27:22 UTC 2010

On Fri, Dec 10, 2010 at 7:50 AM, John Baldwin <jhb at freebsd.org> wrote:
> So I finally had a case today where I wanted to use rtprio but it doesn't seem
> very useful in its current state.  Specifically, I want to be able to tag
> certain user processes as being more important than any other user processes
> even to the point that if one of my important processes blocks on a mutex, the
> owner of that mutex should be more important than sshd being woken up from
> sbwait by new data (for example).  This doesn't work currently with rtprio due
> to the way the priorities are laid out (and I believe I probably argued for
> the current layout back when it was proposed).
>
> The current layout breaks up the global thread priority space (0 - 255) into a
> couple of bands:
>
>  0 -  63 : interrupt threads
>  64 - 127 : kernel sleep priorities (PSOCK, etc.)
> 128 - 159 : real-time user threads (rtprio)
> 160 - 223 : time-sharing user threads
> 224 - 255 : idle threads (idprio and kernel idle procs)
>
> The problem I am running into is that when a time-sharing thread goes to sleep
> in the kernel (waiting on select, socket data, tty, etc.) it actually ends up
> in the kernel priorities range (64 - 127).  This means when it wakes up it
> will trump (and preempt) a real-time user thread even though these processes
> nominally have a priority down in the 160 - 223 range.  We do drop the kernel
> sleep priority during userret(), but we don't recheck the scheduler queues to
> see if we should preempt the thread during userret(), so it effectively runs
> with the kernel sleep priority for the rest of the quantum while it is in
> userland.
>
> My first question is if this behavior is the desired behavior?  Originally I
> think I preferred the current layout because I thought a thread in the kernel
> should always have priority so it can release locks, etc.  However, priority
> propagation should actually handle the case of some very important thread
> needing a lock.  In my use case today where I actually want to use rtprio I
> think I want different behavior where the rtprio thread is more important than
> the thread waking up with PSOCK, etc.
>
> If we decide to change the behavior I see two possible fixes:
>
> 1) (easy) just move the real-time priority range above the kernel sleep
> priority range
>
> 2) (harder) make sched_userret() check the run queue to see if it should
> preempt when dropping the kernel sleep priority.  I think bde@ has suggested
> that we should do this for correctness previously (and I've had some old,
> unfinished patches to do this in a branch in p4 for several years).

As a note on what other operating systems do, AIX does not have any
notion of user-level priorities versus kernel level.  Real-time
threads have the highest prio (and they are scheduled in a round-robin
fashion), and everyone else is managed based on scheduling the thread
that's runnable with the highest priority.  The prios are adjusted
regularly based on interactivity so that a CPU hog, unless explicitly
nice'd or marked SCHED_RR, will get its prio marked down, and down,
and down again as it continues to hog the CPU, while other threads
that haven't been running will get their prio bumped up.

Unfortunately, I only know the architectural detail I just mentioned.
I do not have much knowledge of the 8000 lines of code that
implemented this. :-)

This is all just to say that it's perfectly possible to get a working
OS without fixed priority bands.

Cheers,
matthew