RFC: powerd algorithms enhancements

Ian Smith smithi at nimnet.asn.au
Fri Nov 7 20:12:01 PST 2008


On Fri, 7 Nov 2008, Alexander Motin wrote:
 > Ian Smith wrote:
 > > Hi, sounds like sound's more or less under control, time on your hands? :)
 > 
 > There are many subsystems used in my laptop. :)

Just as well ..

 > >  > I would like to propose the patch for powerd that fixes some issues, makes it
 > >  > more universal and on my opinion more usable. The main ideas of mine were:
 > >  > 
 > >  > 1. To make it more SMP polite. Previous version uses average CPU load that
 > >  > leads to the often load underestimation. It make powerd with default
 > >  > configuration unusable on systems with more then 2 CPUs. I propose to use
 > >  > summary load instead of average one. IMO this is the best we can do without
 > >  > specially tuned scheduler. Also as soon as measuring total load on SMP
 > >  > systems is more useful then total idle, I have switched to it.
 > > 
 > > Ok, very interesting.  First, is this against CURRENT, or to what CVS 
 > > version, so I can read patched version in full?  Any change to .h files?
 > 
 > It's against the HEAD, but applies to 7-STABLE as well.

Thanks; I've had a brief browse only so far, but like the approach.  
Not sure where you get these 7/8 and 31/32 factors from, and I've no 
hardware to test with lots of freqs, but I'm interested to see how 
things like hysteresis under constant but partial load operate (ie 
avoiding too much 'hunting' as freq change varies measured load ..)

 > >  > 2. To make powerd's operation independent from number and size of frequency
 > >  > levels I have added internal frequency counter which translated into real
 > >  > frequencies only on a last stage and only as good as gone. Some systems may
 > >  > have only several power levels, while mine has 17 of them, so adaptation time
 > >  > in completely different. It would be good if algorithm was not depending on
 > >  > it.

Just for reference, could you show the levels you're working with?

 > > There were some XXX comments re longterm allowance for running different 
 > > cpus at different freqs .. I don't know if that's anything to consider?
 > 
 > I don't understand which comments do you mean. But I think that it is
 > now ineffective to run different CPUs ad different frequencies. To do it
 > we should have scheduler aware of CPUs speed to avoid using powered down
 > ones where it is possible. Now it will just lead to significant
 > performance degradation because of CPU load underestimation.

I'll probably get in trouble because I'm referring to older sources and
only have 5.5-STABLE to hand at the moment, but eg /sys/kern/kern_cpu.c:
        /*
         * Only initialize one set of sysctls for all CPUs.  In the future,
         * if multiple CPUs can have different settings, we can move these
         * sysctls to be under every CPU instead of just the first one.
         */
and
        /*
         * While we only call cpufreq_get() on one device (assuming all
         * CPUs have equal levels), we call cpufreq_set() on all CPUs.
         * This is needed for some MP systems.
         */
and
        /*
         * Add only one cpufreq device to each CPU.  Currently, all CPUs
         * must offer the same levels and be switched at the same time.
         */

I'm pretty sure I recall John Baldwin talking about this at some stage 
too, but it's been a year since I last tried figuring all this out.

 > >  > 3. As part of previous I have changed adaptive mode to rise frequency on
 > >  > demand up to 2 times and fall on 1/8 per time internal.
 > > 
 > > I'm wondering how the edge case with only 2 freqs would go?  Eg on my 
 > > T23, single cpu P3 Mobile at 1133 and 733MHz.  That is, I'm wondering if 
 > > your 1/8 factor might better be scaled to no. of cpus and/or no. of 
 > > freqs available?  I'd best say no more until studying your algorithm ..
 > 
 > I have not such case, but I think there should be no problem.

Ok .. just to confirm I'm reading it right: you wind up using the first 
table freq BELOW the calculated desired freq, right?  So with my case of 
1133 & 733, if at 1133 and coming down, 7/8 * 1133 = 991, so select 733?

 > >  > 4. For desktop (AC-powered) systems I have added one more mode -
 > >  > "hiadaptive". It rises frequency twice faster, drops it 4 times slower,
 > >  > prefers twice lower CPU load and has additional delay before leaving the
 > >  > highest frequency after the period of maximum load. This mode was specially
 > >  > made to improve interactivity of the systems where operation capabilities are
 > >  > more significant then power consumption, but keeping maximum frequency all
 > >  > the time is not needed.
 > > 
 > > Great idea.  And one (not so) small step towards some proper profiles, 
 > > where various degrees of performance vs responsiveness vs power use can 
 > > be setup by the user .. extending now binary AC/battery power_profile 
 > > choices (starting freq, lowest Cx state), later perhaps tying in with 
 > > the shutdown/wakeup stuff for both system and individual devices (eg D 
 > > states).  Sorry, just musing aloud .. this has needed a kick for ages :)
 > 
 > Move configuration from command line into configuration file will allow
 > more customized profiles to be written, so if somebody wants to - he may
 > do it. For trivial command line configuration this solution looks like
 > appropriate.

Sure.  I do want to get to that, though C is mostly read-only for me, 
and there's still way too much I know way too little about :) but with 
modern laptops it's clear that just telling people to enable powerd 
isn't cutting it anymore, especially with the sort of issues people are 
seeing with systems dropping back to ridiculously low cpu freqs - like 
perhaps the default cpufreq.lowest ought to be initially set no lower 
than perhaps 1/8 of full speed, out of the box, to mitigate such issues?

 > >  > 5. I have reduced polling interval from 1/2 to 1/4 of second. It is not
 > >  > important for algorithm math now, but gives better system interactivity.
 > > 
 > > You mean the default polling interval I guess, as it's tuneable at least 
 > > on powerd startup, as are the loaded/idle points, which as someone else 
 > > mentioned, might be more dynamically modified while powerd is running?
 > 
 > It's possible, but I don't see real reason to do it. Increased polling
 > interval will lead to significant latency, while economy will be
 > minimal. I think 2KHz of timer interrupts per CPU consume much more
 > energy then powerd waking up 4 times per second.

Ok.  I'll do some testing when I get my 7-STABLE system back up.

 > > Then if you get really bored :) SMP suspend/resume and S4 suspend to 
 > > disk need a champion .. both of which have at least begun in acpi at .
 > 
 > If I do everything, there will nothing left to you. I don't want you to
 > become upset. ;)

:)  We appreciate that!

cheers, Ian


More information about the freebsd-mobile mailing list