Realtime thread priorities
jhb at freebsd.org
Tue Dec 28 19:58:23 UTC 2010
On Friday, December 10, 2010 10:50:45 am John Baldwin wrote:
> So I finally had a case today where I wanted to use rtprio but it doesn't seem
> very useful in its current state. Specifically, I want to be able to tag
> certain user processes as being more important than any other user processes
> even to the point that if one of my important processes blocks on a mutex, the
> owner of that mutex should be more important than sshd being woken up from
> sbwait by new data (for example). This doesn't work currently with rtprio due
> to the way the priorities are laid out (and I believe I probably argued for
> the current layout back when it was proposed).
> The current layout breaks up the global thread priority space (0 - 255) into a
> couple of bands:
> 0 - 63 : interrupt threads
> 64 - 127 : kernel sleep priorities (PSOCK, etc.)
> 128 - 159 : real-time user threads (rtprio)
> 160 - 223 : time-sharing user threads
> 224 - 255 : idle threads (idprio and kernel idle procs)
> The problem I am running into is that when a time-sharing thread goes to sleep
> in the kernel (waiting on select, socket data, tty, etc.) it actually ends up
> in the kernel priorities range (64 - 127). This means when it wakes up it
> will trump (and preempt) a real-time user thread even though these processes
> nominally have a priority down in the 160 - 223 range. We do drop the kernel
> sleep priority during userret(), but we don't recheck the scheduler queues to
> see if we should preempt the thread during userret(), so it effectively runs
> with the kernel sleep priority for the rest of the quantum while it is in
> My first question is if this behavior is the desired behavior? Originally I
> think I preferred the current layout because I thought a thread in the kernel
> should always have priority so it can release locks, etc. However, priority
> propagation should actually handle the case of some very important thread
> needing a lock. In my use case today where I actually want to use rtprio I
> think I want different behavior where the rtprio thread is more important than
> the thread waking up with PSOCK, etc.
> If we decide to change the behavior I see two possible fixes:
> 1) (easy) just move the real-time priority range above the kernel sleep
> priority range
I have forward-ported my original patch for 7 to 9 and fixed several other
nits I ran into along the way. The updated patch is at
I think it can probably be broken up into several pieces at least some of
which should be non-controversial. :)
This patch makes the following changes:
- Give the USB kthreads lower priority in the range of software interrupt
threads rather than hardware interrupt threads.
- Retire some unused ithread priorities: PI_TTYHIGH, PI_TAPE, and
PI_DISKLOW. While here, rename PI_TTYLOW to PI_TTY. Also, add a macro
PI_SWI() that takes a SWI_* constant as an argument and returns the
suitable thread priority.
- In sched_yield(), only drop the priority of timeshare threads to
PRI_MAX_TIMESHARE. Non-timeshare threads retain whatever priority they
- Only apply a kernel sleep priority from tsleep() to timeshare threads.
This is only relevant once realtime threads move to a new priority range
to avoid penalizing realtime threads for sleeping.
- Explicitly set a sane initial priority (of PVM) for kthreads. Right now
new kthreads inherit whatever priority thread0 happens to have when they
are created. Since kthreads can be created from threads other than thread0
this priority can be fairly random. In practice, I've seen many kthreads
created with an initial priority that is a hardware interrupt thread
priority due to thread0 being lent an ithread priority.
- Add some helper macros to ULE to define the ranges used for interactive
and non-interactive timeshare threads and fix some places that hardcoded
assumptions about the location of the realtime priority range.
- Add a new option (that should perhaps be on by default) for use in
conjunction with moving realtime priorities ULE_INTERACTIVE_TIMESHARE.
When this new option is in effect, ULE does not abuse realtime priorites
for interactive timeshare threads. Instead, the timeshare range is split
into two ranges, one for interactive threads and one for non-interactive
threads. The non-interactive range is further divided into three ranges
to add bands at the top and bottom for nice levels. Combined with the
other changes, the net effect is that interactive threads will have the
same priority they have now (i.e. a band of 32 priorities in between
kernel sleep priorities and non-interactive timeshare priorities) and
that non-interactive threads now have a slightly larger band of
priorities (32 priorities in the "middle" instead of 24 with additional
bands of 20 above and below for nice values).
- Never boost the priority of a thread via tsleep() if the passed in
priority is zero. Zero means "don't change the priority", but ULE was
still giving a boost in certain cases. In practice I suspect this rarely,
if ever, triggered.
- Always apply the requested sleep priority to kthreads. Certain kernel
processes such as pagedaemon, etc. rely on tsleep() to lower the priority
of the kproc so that it is treated as a background task when it is idle.
The static_boost code in ULE would never lower the priority due to a sleep,
so once a kproc gained a higher priority via sleeping it would never be
treated as a background task again. This is especially problematic in the
case that a kthread starts off with an ithread priority as noted above.
- Retire the PCONFIG kernel sleep priority. We do not need a new priority
level for boot time config hooks. When PCONFIG was added, tsleep() did
not support leaving the priority alone via 0, but now it does support that
so use that instead.
- Restore dropping the syncer kthread down to PPAUSE when it is idle.
- Drop the flowtable cleaner kthread down to PPAUSE when it is idle.
- Move the realtime priority range in between the interrupt thread and
kernel sleep priority range. Currently there is a small bit of overlap
between SWI_TQ and SWI_TQ_GIANT and 'rtprio 0'. I hope to eliminate this
by retiring SWI_TQ_FAST once interrupt filters are in place as then
SwI_TQ and SWI_TQ_GIANT can move up a slot.
More information about the freebsd-arch