Realtime thread priorities
John Baldwin
jhb at freebsd.org
Fri Dec 10 21:41:54 UTC 2010
On Friday, December 10, 2010 2:57:16 pm Kostik Belousov wrote:
> On Fri, Dec 10, 2010 at 11:33:55AM -0500, John Baldwin wrote:
> > On Friday, December 10, 2010 11:26:31 am Kostik Belousov wrote:
> > > On Fri, Dec 10, 2010 at 10:50:45AM -0500, John Baldwin wrote:
> > > > So I finally had a case today where I wanted to use rtprio but it doesn't seem
> > > > very useful in its current state. Specifically, I want to be able to tag
> > > > certain user processes as being more important than any other user processes
> > > > even to the point that if one of my important processes blocks on a mutex, the
> > > > owner of that mutex should be more important than sshd being woken up from
> > > > sbwait by new data (for example). This doesn't work currently with rtprio due
> > > > to the way the priorities are laid out (and I believe I probably argued for
> > > > the current layout back when it was proposed).
> > > >
> > > > The current layout breaks up the global thread priority space (0 - 255) into a
> > > > couple of bands:
> > > >
> > > > 0 - 63 : interrupt threads
> > > > 64 - 127 : kernel sleep priorities (PSOCK, etc.)
> > > > 128 - 159 : real-time user threads (rtprio)
> > > > 160 - 223 : time-sharing user threads
> > > > 224 - 255 : idle threads (idprio and kernel idle procs)
> > > >
> > > > The problem I am running into is that when a time-sharing thread goes to sleep
> > > > in the kernel (waiting on select, socket data, tty, etc.) it actually ends up
> > > > in the kernel priorities range (64 - 127). This means when it wakes up it
> > > > will trump (and preempt) a real-time user thread even though these processes
> > > > nominally have a priority down in the 160 - 223 range. We do drop the kernel
> > > > sleep priority during userret(), but we don't recheck the scheduler queues to
> > > > see if we should preempt the thread during userret(), so it effectively runs
> > > > with the kernel sleep priority for the rest of the quantum while it is in
> > > > userland.
> > > >
> > > > My first question is if this behavior is the desired behavior? Originally I
> > > > think I preferred the current layout because I thought a thread in the kernel
> > > > should always have priority so it can release locks, etc. However, priority
> > > > propagation should actually handle the case of some very important thread
> > > > needing a lock. In my use case today where I actually want to use rtprio I
> > > > think I want different behavior where the rtprio thread is more important than
> > > > the thread waking up with PSOCK, etc.
> > > >
> > > > If we decide to change the behavior I see two possible fixes:
> > > >
> > > > 1) (easy) just move the real-time priority range above the kernel sleep
> > > > priority range
> > > >
> > > > 2) (harder) make sched_userret() check the run queue to see if it should
> > > > preempt when dropping the kernel sleep priority. I think bde@ has suggested
> > > > that we should do this for correctness previously (and I've had some old,
> > > > unfinished patches to do this in a branch in p4 for several years).
> > >
> > > Would not doing #2 allow e.g. two threads that perform ping-pong with
> > > a single byte read/write into a socket to usurp the CPU ? The threads
> > > could try to also do some CPU-intensive calculations for some time
> > > during the quantum too.
> > >
> > > Such threads are arguably "interactive", but I think that the gain is
> > > priority is too unfair.
> >
> > Err, I think that what you describe is the current case and is what #2 would
> > seek to change.
>
> Sorry, might be my language was not clear, but I said "Would not doing
> #2 allow ...", i.e. I specifically mean that we shall do #2 to avoid the
> situation I described.
Ah, yes, it does allow that. As bde@ said though, the overhead of extra
context switches in the common case might not be worth it.
I have a possible patch for 1), but it involves fixing a few places and is
only compile tested yet (will run test it soon). I also think that in my
case I almost always want 1) anyway (my realtime processes are always more
important than sshd, even while sshd is in the kernel):
Index: kern/kern_synch.c
===================================================================
--- kern/kern_synch.c (revision 215592)
+++ kern/kern_synch.c (working copy)
@@ -214,7 +214,8 @@
* Adjust this thread's priority, if necessary.
*/
pri = priority & PRIMASK;
- if (pri != 0 && pri != td->td_priority) {
+ if (pri != 0 && pri != td->td_priority &&
+ td->td_pri_class == PRI_TIMESHARE) {
thread_lock(td);
sched_prio(td, pri);
thread_unlock(td);
@@ -552,7 +553,8 @@
{
thread_lock(td);
- sched_prio(td, PRI_MAX_TIMESHARE);
+ if (td->td_pri_class == PRI_TIMESHARE)
+ sched_prio(td, PRI_MAX_TIMESHARE);
mi_switch(SW_VOL, NULL);
thread_unlock(td);
td->td_retval[0] = 0;
Index: kern/subr_sleepqueue.c
===================================================================
--- kern/subr_sleepqueue.c (revision 215592)
+++ kern/subr_sleepqueue.c (working copy)
@@ -693,7 +720,8 @@
/* Adjust priority if requested. */
MPASS(pri == -1 || (pri >= PRI_MIN && pri <= PRI_MAX));
- if (pri != -1 && td->td_priority > pri)
+ if (pri != -1 && td->td_priority > pri &&
+ td->td_pri_class == PRI_TIMESHARE)
sched_prio(td, pri);
return (setrunnable(td));
}
Index: sys/priority.h
===================================================================
--- sys/priority.h (revision 215592)
+++ sys/priority.h (working copy)
@@ -68,8 +68,8 @@
* are insignificant. Ranges are as follows:
*
* Interrupt threads: 0 - 63
- * Top half kernel threads: 64 - 127
- * Realtime user threads: 128 - 159
+ * Realtime user threads: 64 - 95
+ * Top half kernel threads: 96 - 159
* Time sharing user threads: 160 - 223
* Idle user threads: 224 - 255
*
@@ -81,7 +81,7 @@
#define PRI_MAX (255) /* Lowest priority. */
#define PRI_MIN_ITHD (PRI_MIN)
-#define PRI_MAX_ITHD (PRI_MIN_KERN - 1)
+#define PRI_MAX_ITHD (PRI_MIN_REALTIME - 1)
#define PI_REALTIME (PRI_MIN_ITHD + 0)
#define PI_AV (PRI_MIN_ITHD + 4)
@@ -94,9 +94,12 @@
#define PI_DULL (PRI_MIN_ITHD + 32)
#define PI_SOFT (PRI_MIN_ITHD + 36)
-#define PRI_MIN_KERN (64)
-#define PRI_MAX_KERN (PRI_MIN_REALTIME - 1)
+#define PRI_MIN_REALTIME (64)
+#define PRI_MAX_REALTIME (PRI_MIN_KERN - 1)
+#define PRI_MIN_KERN (96)
+#define PRI_MAX_KERN (PRI_MIN_TIMESHARE - 1)
+
#define PSWP (PRI_MIN_KERN + 0)
#define PVM (PRI_MIN_KERN + 4)
#define PINOD (PRI_MIN_KERN + 8)
@@ -109,9 +112,6 @@
#define PLOCK (PRI_MIN_KERN + 36)
#define PPAUSE (PRI_MIN_KERN + 40)
-#define PRI_MIN_REALTIME (128)
-#define PRI_MAX_REALTIME (PRI_MIN_TIMESHARE - 1)
-
#define PRI_MIN_TIMESHARE (160)
#define PRI_MAX_TIMESHARE (PRI_MIN_IDLE - 1)
--
John Baldwin
More information about the freebsd-arch
mailing list