[RFT][patch] Scheduling for HTT and not only

Mon Feb 13 20:47:29 UTC 2012

On Mon, 13 Feb 2012, Alexander Motin wrote:

> On 02/11/12 16:21, Alexander Motin wrote:
>> I've heavily rewritten the patch already. So at least some of the ideas
>> are already addressed. :) At this moment I am mostly satisfied with
>> results and after final tests today I'll probably publish new version.
>
> It took more time, but finally I think I've put pieces together:
> http://people.freebsd.org/~mav/sched.htt23.patch

I need some time to read and digest this.  However, at first glance, a 
global pickcpu lock will not be acceptable.  Better to make a rarely 
imperfect decision than too often cause contention.

>
> The patch is more complicated then previous one both logically and 
> computationally, but with growing CPU power and complexity I think we can 
> possibly spend some more time deciding how to spend time. :)
>

It is probably worth more cycles but we need to evaluate this much more 
complex algorithm carefully to make sure that each of these new features 
provides an advantage.

> Patch formalizes several ideas of the previous code about how to select CPU 
> for running a thread and adds some new. It's main idea is that I've moved 
> from comparing raw integer queue lengths to higher-resolution flexible 
> values. That additional 8-bit precision allows same time take into account 
> many factors affecting performance. Beside just choosing best from 
> equally-loaded CPUs, with new code it may even happen that because of SMT, 
> cache affinity, etc, CPU with more threads on it's queue will be reported as 
> less loaded and opposite.
>
> New code takes into account such factors:
> - SMT sharing penalty.
> - Cache sharing penalty.
> - Cache affinity (with separate coefficients for last-level and other level 
> caches) to the:

We already used separate affinity values for different cache levels.  Keep 
in mind that if something else has run on a core the cache affinity is 
lost in very short order.  Trying too hard to preserve it beyond a few ms 
never seems to pan out.

>  - other running threads of it's process,

This is not really a great indicator of whether things should be scheduled 
together or not.  What workload are you targeting here?

>  - previous CPU where it was running,
>  - current CPU (usually where it was called from).

These two were also already used.  Additionally:

+				 * Hide part of the current thread
+				 * load, hoping it or the scheduled
+				 * one complete soon.
+				 * XXX: We need more stats for this.

I had something like this before.  Unfortunately interactive tasks are 
allowed fairly aggressive bursts of cpu to account for things like xorg 
and web browsers.  Also, I tried this for ithreads but they can be very 
expensive in some workloads so other cpus will idle as you try to schedule 
behind an ithread.

> All of these factors are configurable via sysctls, but I think reasonable 
> defaults should fit most.
>
> Also, comparing to previous patch, I've resurrected optimized shortcut in CPU 
> selection for the case of SMT. Comparing to original code having problems 
> with this, I've added check for other logical cores load that should make it 
> safe and still very fast when there are less running threads then physical 
> cores.
>
> I've tested in on Core i7 and Atom systems, but more interesting would be to 
> test it on multi-socket system with properly detected topology to check 
> benefits from affinity.
>
> At this moment the main issue I see is that this patch affects only time when 
> thread is starting. If thread runs continuously, it will stay where it was, 
> even if due to situation change that is not very effective (causes SMT 
> sharing, etc). I haven't looked much on periodic load balancer yet, but 
> probably it could also be somehow improved.
>
> What is your opinion, is it too over-engineered, or it is the right way to 
> go?

I think it's a little too much change all at once.  I also believe that 
the changes that try very hard to preserve affinity likely help a much 
smaller number of cases than they hurt.  I would prefer you do one piece 
at a time and validate each step.  There are a lot of good ideas in here 
but good ideas don't always turn into results.

Thanks,
Jeff

>
> -- 
> Alexander Motin
>