SCHED_ULE should not be the default
sgk at troutmask.apl.washington.edu
Thu Dec 22 16:31:09 UTC 2011
On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote:
> On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
>> I have placed several files at
>> dmesg.txt --> dmesg for ULE kernel
>> summary --> A summary that includes top(1) output of all runs.
>> sysctl.ule.txt --> sysctl -a for the ULE kernel
>> Since time is executed on the master, only the 'real' time is of
>> interest (the summary file includes user and sys times). This
>> command is run at 5 times for each N value and up to 10 time for
>> some N values with the ULE kernel. The following table records
>> the average 'real' time and the number in (...) is the mean
>> absolute deviations.
>> # N ULE 4BSD
>> # -------------------------------------
>> # 4 223.27 (0.502) 221.76 (0.551)
>> # 5 404.35 (73.82) 270.68 (0.866)
>> # 6 627.56 (173.0) 247.23 (1.442)
>> # 7 475.53 (84.07) 285.78 (1.421)
>> # 8 429.45 (134.9) 223.64 (1.316)
> One explanation for taking 1.5-2x times is that with ULE the
> threads are not migrated properly, so you end up with idle cores
> and ready threads not running
That's what I guessed back in 2008 when I first reported the
The top(1) output at the above URL shows 10 completely independent
instances of the same numerically intensive application running
on a circa 2008 ULE kernel. Look at the PRI column. The high
PRI jobs are not only pinned to a cpu, but these are running at
100% WCPU. The low PRI jobs seem to be pinned to a subset of the
available cpus and simply ping-pong in and out of the same cpus.
In this instance, there are 5 jobs competing for time on 3 cpus.
> Also, perhaps one could build a simple test process that replicates
> this workload (so one can run it as part of regression tests):
> 1. define a CPU-intensive function f(n) which issues no
> system calls, optionally touching
> a lot of memory, where n determines the number of iterations.
> 2. by trial and error (or let the program find it),
> pick a value N1 so that the minimum execution time
> of f(N1) is in the 10..100ms range
> 3. now run the function f() again from an outer loop so
> that the total execution time is large (10..100s)
> again with no intervening system calls.
> 4. use an external shell script can rerun a process
> when it terminates, and then run multiple instances
> in parallel. Instead of the external script one could
> fork new instances before terminating, but i am a bit
> unclear how CPU inheritance works when a process forks.
> Going through the shell possibly breaks the chain.
The tests at the above URL does essentially what you
propose except in 2008 the kzk90 programs were doing
More information about the freebsd-stable