SMP and setrunnable()- scheduler 4bsd

John Baldwin jhb at
Thu Jul 10 12:21:58 PDT 2003

On 10-Jul-2003 Julian Elischer wrote:
> OK so I return with some numbers....
> On Tue, 8 Jul 2003, John Baldwin wrote:
>> On 08-Jul-2003 Julian Elischer wrote:
>> > It looks tp me that if we make a thread runnable
>> > and there is a processor in the idle loop, the idle processor should be
>> > kicked in some way to make it go get the newly runnable thread.
>> > 
>> > If the processors are halting in the idle loop however, it may take
>> > quite a while for the new work to be noticed..
>> > (possibly up to milliseconds I think)
>> > 
>> > Is there a mechanism to send an IPI to particular processors?
>> > or is it just broadcast?  
>> > 
>> > 
>> > I think we would be better served to alter idle_proc(void *dummy)
>> > (or maybe choosethread()) to increment or decrement a count
>> > of idle processors (atomically of course) so that 
>> > setrunnable (or it's lower parts) can send that IPI
>> > and get the idle processor into actioan as soon as a thread is
>> > available.
>> > 
>> > I have not seen any such code but maybe I'm wrong....
>> This is why HLT is not enabled in SMP by default (or at least was,
>> it may be turned on now).  Given that the clock interrupts are
>> effectively broadcast to all CPU's one way or another for all
>> arch's (that I know of), you will never halt more than the interval
>> between clock ticks on any CPU.
> So here are some figures..
> dual# sysctl machdep.cpu_idle_hlt
> machdep.cpu_idle_hlt: 1
> 307.773u 93.000s 4:22.17 152.8% 3055+5920k 51+1046io 284pf+0w
> 307.762u 93.082s 4:23.22 152.2% 3061+5925k 4+1012io 8pf+0w
> dual# sysctl machdep.cpu_idle_hlt=0
> machdep.cpu_idle_hlt: 1 -> 0
> 357.264u 115.377s 4:25.21 178.2%        3150+5982k 7+1021io 8pf+0w
> 356.193u 116.551s 4:24.70 178.5%        3145+5980k 5+991io 8pf+0w
> reboot to kernel with IPIs for idle processors.. (patch available)
> dual# sysctl machdep.cpu_idle_hlt
> machdep.cpu_idle_hlt: 1
> 308.113u 90.422s 4:19.46 153.5% 3061+5941k 13+989io 22pf+0w
> 308.430u 93.501s 4:22.86 152.9% 3045+5897k 70+1022io 8pf+0w
> dual# sysctl machdep.cpu_idle_hlt=0
> machdep.cpu_idle_hlt: 1 -> 0
> 357.809u 113.757s 4:24.12 178.5%        3148+6020k 31+1016io 8pf+0w
> 356.193u 115.195s 4:24.22 178.4%        3150+5983k 30+1029io 8pf+0w
> dual# sysctl machdep.cpu_idle_hlt=1
> machdep.cpu_idle_hlt: 0 -> 1
> 308.132u 92.196s 4:23.15 152.1% 3044+5910k 30+1033io 8pf+0w
> 307.504u 93.581s 4:23.22 152.3% 3047+5913k 29+1055io 8pf+0w
> What is so stunning is the massive increase in user time 
> for the case where the cpu is not being idled.
> I'm hoping this is a statistical artifact of some sort..

I don't think it is, but you'd need more samples to be truly confident.
One possible reason: having the CPU's not halt means that idle CPU's
bang on the runq state continuously.  Perhaps this can penalize the
non-idle CPU's due to cache interactions both when the non-idle CPU's
are manipulating the queues and also by making the cache lines holding
the queue state always be resident and not allowing their effective use
by the real code executing on other CPUs.

> either way, the times are almost identical.
> Having the cpu halt during idle time seems to be 
> slightly faster (1 second out of 250? not too significant)
> It would however be good to see thread wakeup latency times.
> (I'll work on that)
> The patch to send an IPI when an thread becomes runnabel and there 
> are idle CPUs seems to not hurt this case at least.
> it may however make a lot of difference in the case of
> KSE threads waking each other up..
> I'll do some tests.

Yes.  As it stands now, adding the IPI would just make things more
complex for no gain.  However, if this IPI is present, then we can
engage in perhaps more drastic measures like really putting a CPU
to sleep (perhaps disabling interrupts to it?) until it is needed
which might bring significant power and heat savings to idle SMP

> It seems however that having the halt on idle turned on is the 
> right thing these days. (which is the current default)
> but the odd user times are a worry.

I'm sure Terry is all torn up by that conclusion. :-P


John Baldwin <jhb at>  <><
"Power Users Use the Power to Serve!"  -

More information about the freebsd-current mailing list