Improved ULE load balancing.
Jeff Roberson
jroberson at chesapeake.net
Fri Jan 19 22:07:08 UTC 2007
I'd like those of you that reported relatively poor SMP performance on ULE
to update to revision 1.179. This improved performance on my dual xeon to
about 10% better than 4BSD running supersmack. It is also highly tunable.
Some options of interest:
kern.sched. :
pick_pri - The default is on. Turning this off will revert to the older
algorithm which is now called pickidle. pick_pri tries to always run the
highest priority threads. pickidle really just tries to balance cpu load
and doesn't take priority into consideration.
pick_pri_affinity - Number of ticks a thread has slept for before we stop
considering it as having affinity for a given cpu.
busy_thresh - Length of run queue allowed before idle cpus will try to
steal some of our work. This defaults to 4 but on some workloads I see
improvement with values as low as 2.
ipi_thresh - Priorities below this generate IPIs to preempt the target
cpu. Can decrease latency for some workloads but at the expense of extra
context switches and interrupt overhead.
The default configuration was fastest on the most workloads on my 8way
opteron and 2x xeon (+2xHTT). I tested parallel compiles and super-smack
with select-key.smack doing different workloads on both machines and with
different numbers of processors enabled on the 8way opteron. The opteron
in 8way mode shows about 300% speedup compared to 4BSD on super-smack.
compile times are nearly identical across all schedulers and platforms. I
get a more modest 5-10% faster on super-smack on my xeon running
super-smack depending on the configuration.
Please report back your findings. Hopefully with the tunables present I
can experiment and get the settings ride for a wide array of machines.
Thanks,
Jeff
---------- Forwarded message ----------
Date: Fri, 19 Jan 2007 21:56:08 +0000 (UTC)
From: Jeff Roberson <jeff at FreeBSD.org>
To: src-committers at FreeBSD.org, cvs-src at FreeBSD.org, cvs-all at FreeBSD.org
Subject: cvs commit: src/sys/kern sched_ule.c
jeff 2007-01-19 21:56:08 UTC
FreeBSD src repository
Modified files:
sys/kern sched_ule.c
Log:
Major revamp of ULE's cpu load balancing:
- Switch back to direct modification of remote CPU run queues. This added
a lot of complexity with questionable gain. It's easy enough to
reimplement if it's shown to help on huge machines.
- Re-implement the old tdq_transfer() call as tdq_pickidle(). Change
sched_add() so we have selectable cpu choosers and simplify the logic
a bit here.
- Implement tdq_pickpri() as the new default cpu chooser. This algorithm
is similar to Solaris in that it tries to always run the threads with
the best priorities. It is actually slightly more complex than
solaris's algorithm because we also tend to favor the local cpu over
other cpus which has a boost in latency but also potentially enables
cache sharing between the waking thread and the woken thread.
- Add a bunch of tunables that can be used to measure effects of different
load balancing strategies. Most of these will go away once the
algorithm is more definite.
- Add a new mechanism to steal threads from busy cpus when we idle. This
is enabled with kern.sched.steal_busy and kern.sched.busy_thresh. The
threshold is the required length of a tdq's run queue before another
cpu will be able to steal runnable threads. This prevents most queue
imbalances that contribute the long latencies.
Revision Changes Path
1.179 +293 -240 src/sys/kern/sched_ule.c
More information about the freebsd-current
mailing list