(patch) Re: Periodic rant about SCHED_ULE

From: Peter <pmc_at_citylink.dinoex.sub.org>
Date: Sat, 31 Jul 2021 23:48:40 UTC
Hi all,

  let's hope I can post now. Somehow my subscriptions have been
deleted and I cannot not post anymore to the lists, and *some* of
them (e.g. "stable") have also ceased to deliver, since 3rd July.
Neither can I find a notification about such change. Now I am trying
to subsribe anew... hth (Such unsuspected changes are something that
makes me angry)

I was just seeing the  discussion about Sched-ULE.
I myself was once hit by a serious flaw in the performing of
sched-ULE, so I went into the source and fixed that specific matter.
It solves the issue that of multiple compute jobs the one with the
least I/O gets starved; I don't know if there are side-effects or
downsides. I don't notice a performance impact, but I have no idea
how this might behave on e.g. networking routers or such.

During fixing I found one or two other obvious bugs in the code. I
tried to talk to the developer, but that communication somehow ended
in mid-air. I don't know what to further do with this - a patch of
mine lingers in the sendbug facility for some 12 years already without
being reacted on, so it is pointless to send patches. And in this case
I don't even know if this one does suit all conditions except solving my
problem. (It's not about absolute performance, but about making things
run evenly.)

So here it is, here you have it and see to it. The thing is switchable
in runtime, you get a new sysctl kern.sched.resume_preempted,
"1" gives the default behaviour from distribution,
"0" activates the patch. Maybe it helps for one or the other. Cheerio.

diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
index 50f037076f4..f0bb6b38db4 100644
--- a/sys/kern/sched_ule.c
+++ b/sys/kern/sched_ule.c
@@ -38,7 +38,7 @@
 #include <sys/cdefs.h>
+__FBSDID("$FreeBSD: releng/12.2/sys/kern/sched_ule.c 355610 2019-12-11 15:15:21Z mav $");
 #include "opt_hwpmc_hooks.h"
 #include "opt_sched.h"
@@ -223,6 +223,7 @@ static int __read_mostly preempt_thresh = 0;
 static int __read_mostly static_boost = PRI_MIN_BATCH;
 static int __read_mostly sched_idlespins = 10000;
 static int __read_mostly sched_idlespinthresh = -1;
+static int __read_mostly resume_preempted = 1;
  * tdq - per processor runqs and statistics.  All fields are protected by the
@@ -483,7 +484,10 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
                 * This queue contains only priorities between MIN and MAX
                 * realtime.  Use the whole queue to represent these values.
-               if ((flags & (SRQ_BORROWING|SRQ_PREEMPTED)) == 0) {
+               if (((flags & SRQ_PREEMPTED) && resume_preempted) ||
+                               (flags & SRQ_BORROWING))
+                       pri = tdq->tdq_ridx;
+               else {
                        pri = RQ_NQS * (pri - PRI_MIN_BATCH) / PRI_BATCH_RANGE;
                        pri = (pri + tdq->tdq_idx) % RQ_NQS;
@@ -494,8 +498,7 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
                        if (tdq->tdq_ridx != tdq->tdq_idx &&
                            pri == tdq->tdq_ridx)
                                pri = (unsigned char)(pri - 1) % RQ_NQS;
-               } else
-                       pri = tdq->tdq_ridx;
+               }
                runq_add_pri(ts->ts_runq, td, pri, flags);
        } else
@@ -3073,6 +3076,9 @@ SYSCTL_INT(_kern_sched, OID_AUTO, interact, CTLFLAG_RW, &sched_interact, 0,
 SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW,
     &preempt_thresh, 0,
     "Maximal (lowest) priority for preemption");
+SYSCTL_INT(_kern_sched, OID_AUTO, resume_preempted, CTLFLAG_RW,
+    &resume_preempted, 0,
+    "Reinsert preemted threads at queue-head");
 SYSCTL_INT(_kern_sched, OID_AUTO, static_boost, CTLFLAG_RW, &static_boost, 0,
     "Assign static kernel priorities to sleeping threads");
 SYSCTL_INT(_kern_sched, OID_AUTO, idlespins, CTLFLAG_RW, &sched_idlespins, 0,