Re: Periodic rant about SCHED_ULE

From: RW via freebsd-hackers <>
Date: Mon, 19 Jul 2021 00:37:43 UTC
On Thu, 15 Jul 2021 11:03:04 +1000
Dewayne Geraghty wrote:

> On 15/07/2021 1:47 am, RW via freebsd-hackers wrote:

> > kern.sched.preempt_thresh=224
> >
> > I think the default only allows preemption by real-time and kernel
> > threads. 
> >  
> Hi RW,  Note the PRI(ority) column when you perform /usr/bin/top. 
> Processes with a PRI below the default kern.sched.preempt_thresh=80
> (ie nice -n 8) may pre-empt other processes or send interprocessor
> interrupts to others (CPUs).

I haven't got time to look into this in detail but from a cursory
examination it looks like there must be some kind of translation between
the PRI values seen in top and the priorities used in the scheduler.

A threshold of 80 would be sensible in the context of top. I just ran a
test and a cpu-bound process got a PRI of 85. But this seems to be a
pure coincidence.

kern.sched.preempt_thresh is being compared with the schedulers internal
priorities and is defaulted to PRI_MIN_KERN, the highest priority in
the kernel range, one level below realtime.

From sys/sys/priority.h

#define PRI_MIN_REALTIME        (48)
#define PRI_MAX_REALTIME        (PRI_MIN_KERN - 1)

#define PRI_MIN_KERN            (80)
#define PRI_MAX_KERN            (PRI_MIN_TIMESHARE - 1)

#define PRI_MIN_TIMESHARE       (120)
#define PRI_MAX_TIMESHARE       (PRI_MIN_IDLE - 1)

#define PRI_MIN_IDLE            (224)

> idprio 0 top
> is assigned a starting PRI of 124; so on SCHED_ULE, these processes
> will receive cpu time (even at idprio 31) but won't pre-empt others.
> If you really want all processes to pre-empt others, enabling
> FULL_PREEMPTION achieves the same goal as 224.  I don't have a use
> case for no pre-emption. Anyone?
> Why kern.sched.preempt_thresh=224 helps desktop users, I can only
> speculate that with a high threshold, more IPI's are sent to other CPU
> cores so they can be busy (?).  Refer to
> /usr/src/sys/kern/sched_ule.c --
> Returning to the topic.  Its a very hard choice between schedulers.  I
> did a lot of testing between them and tuning to see if one excelled on
> my humble Xeon-E3.  I couldn't see a significant difference between
> workloads - though next time (and a hint for others) I'll disable SMT
> and set dev.cpu.0.freq to disable turbo behaviour.  For now,
> sched_4bsd appears to be more efficient in terms of code complexity
> and people with high CPU workloads have preferred sched_4bsd in the
> past, while sched_ule has a lot of things to tweak and is recommended
> by the FreeBSD project. Otherwise it wouldn't be the default  
> Looking at
> their histories are tweaked a couple of times a year, so I wouldn't
> rule sched_4bsd out of contention just yet. 
> FWIW, my servers modify only:
> kern.sched.affinity=7
> kern.sched.interact=0
> kern.sched.slice=128
> while firewalls:
> kern.sched.balance=0
> kern.sched.interact=0
> A loadable schedule has been discussed here a few times - I vaguely
> recall it being inefficient (complexity) and unnecessary (you'll
> determine one scheduler and unless testing, unlikely to change). 
> Further in the past, sched_4bsd was to be removed, but some
> demonstrated it had better performance for their workload.
> Cheerio.