[RFT][patch] Scheduling for HTT and not only
Jeff Roberson
jroberson at jroberson.net
Mon Feb 13 20:47:29 UTC 2012
On Mon, 13 Feb 2012, Alexander Motin wrote:
> On 02/11/12 16:21, Alexander Motin wrote:
>> I've heavily rewritten the patch already. So at least some of the ideas
>> are already addressed. :) At this moment I am mostly satisfied with
>> results and after final tests today I'll probably publish new version.
>
> It took more time, but finally I think I've put pieces together:
> http://people.freebsd.org/~mav/sched.htt23.patch
I need some time to read and digest this. However, at first glance, a
global pickcpu lock will not be acceptable. Better to make a rarely
imperfect decision than too often cause contention.
>
> The patch is more complicated then previous one both logically and
> computationally, but with growing CPU power and complexity I think we can
> possibly spend some more time deciding how to spend time. :)
>
It is probably worth more cycles but we need to evaluate this much more
complex algorithm carefully to make sure that each of these new features
provides an advantage.
> Patch formalizes several ideas of the previous code about how to select CPU
> for running a thread and adds some new. It's main idea is that I've moved
> from comparing raw integer queue lengths to higher-resolution flexible
> values. That additional 8-bit precision allows same time take into account
> many factors affecting performance. Beside just choosing best from
> equally-loaded CPUs, with new code it may even happen that because of SMT,
> cache affinity, etc, CPU with more threads on it's queue will be reported as
> less loaded and opposite.
>
> New code takes into account such factors:
> - SMT sharing penalty.
> - Cache sharing penalty.
> - Cache affinity (with separate coefficients for last-level and other level
> caches) to the:
We already used separate affinity values for different cache levels. Keep
in mind that if something else has run on a core the cache affinity is
lost in very short order. Trying too hard to preserve it beyond a few ms
never seems to pan out.
> - other running threads of it's process,
This is not really a great indicator of whether things should be scheduled
together or not. What workload are you targeting here?
> - previous CPU where it was running,
> - current CPU (usually where it was called from).
These two were also already used. Additionally:
+ * Hide part of the current thread
+ * load, hoping it or the scheduled
+ * one complete soon.
+ * XXX: We need more stats for this.
I had something like this before. Unfortunately interactive tasks are
allowed fairly aggressive bursts of cpu to account for things like xorg
and web browsers. Also, I tried this for ithreads but they can be very
expensive in some workloads so other cpus will idle as you try to schedule
behind an ithread.
> All of these factors are configurable via sysctls, but I think reasonable
> defaults should fit most.
>
> Also, comparing to previous patch, I've resurrected optimized shortcut in CPU
> selection for the case of SMT. Comparing to original code having problems
> with this, I've added check for other logical cores load that should make it
> safe and still very fast when there are less running threads then physical
> cores.
>
> I've tested in on Core i7 and Atom systems, but more interesting would be to
> test it on multi-socket system with properly detected topology to check
> benefits from affinity.
>
> At this moment the main issue I see is that this patch affects only time when
> thread is starting. If thread runs continuously, it will stay where it was,
> even if due to situation change that is not very effective (causes SMT
> sharing, etc). I haven't looked much on periodic load balancer yet, but
> probably it could also be somehow improved.
>
> What is your opinion, is it too over-engineered, or it is the right way to
> go?
I think it's a little too much change all at once. I also believe that
the changes that try very hard to preserve affinity likely help a much
smaller number of cases than they hurt. I would prefer you do one piece
at a time and validate each step. There are a lot of good ideas in here
but good ideas don't always turn into results.
Thanks,
Jeff
>
> --
> Alexander Motin
>
More information about the freebsd-hackers
mailing list