cvs commit: src/sys/kern sched_ule.c

Jeff Roberson jroberson at
Sun Mar 2 08:27:44 UTC 2008

On Sun, 2 Mar 2008, Jeff Roberson wrote:

> jeff        2008-03-02 08:20:59 UTC
>  FreeBSD src repository
>  Modified files:
>    sys/kern             sched_ule.c
>  Log:
>  Add support for the new cpu topology api:
>   - When searching for affinity search backwards in the tree from the last
>     cpu we ran on while the thread still has affinity for the group.   This
>     can take advantage of knowledge of shared L2 or L3 caches among a
>     group of cores.
>   - When searching for the least loaded cpu find the least loaded cpu via
>     the least loaded path through the tree.  This load balances system bus
>     links, individual cache levels, and hyper-threaded/SMT cores.
>   - Make the periodic balancer recursively balance the highest and lowest
>     loaded cpu across each link.
>  Add support for cpusets:
>   - Convert the cpuset to a simple native cpumask_t while the kernel still
>     only supports cpumask.
>   - Pass the derived cpumask down through the cpu_search functions to
>     restrict the result cpus.
>   - Make the various steal functions resilient to failure since all threads
>     can not run on all cpus any longer.
>  General improvements:
>   - Precisely track the lowest priority thread on every runq with
>     tdq_setlowpri().  Before it was more advisory but this ended up having
>     pathological behaviors.
>   - Remove many #ifdef SMP conditions to simplify the code.
>   - Get rid of the old cumbersome tdq_group.  This is more naturally
>     expressed via the cpu_group tree.

With these changes ULE is the only scheduler that supports the new cpuset 
api.  It succeeds on 4BSD but the scheduler doesn't obey the masks. 
I don't presently have a plan to implement it on 4BSD as it will be 
potentially very inefficient to search the runq for a compatible thread on 
every context switch.  I won't object if someone else wants to implement 
this, otherwise I'll make the syscalls return ENOSYS if 4BSD is compiled 

The improved cpu topology load balancing is a mixed bag.  On some 
workloads we see considerable improvements.  Right now mysql suffers when 
it has large numbers of threads but other things seem much improved.  I 
will be continuing to tune this however and in most cases it's a win 

Kris has done some excellent benchmarking as usual.  Here you can see the 
improvement in postgres depending on various scheduler debug settings:

The horrible green line is 7.0 for reference.  The blue line is the same 
16core machine with half of the cores disabled.


>  Sponsored by:   Nokia
>  Testing by:     kris
>  Revision  Changes    Path
>  1.226     +443 -501  src/sys/kern/sched_ule.c

More information about the cvs-src mailing list