ULE 2.0

Thu Jan 4 16:45:06 PST 2007

On Thu, 4 Jan 2007, Jeff Roberson wrote:

> On Fri, 5 Jan 2007, David Xu wrote:
>
>> On Friday 05 January 2007 03:31, Kip Macy wrote:
>>> 4BSD is robust,  but I wouldn't say it scales well. The way its decay
>>> scheduling is implemented is very centered around having one queue. The
>>> algorithm also does not have any way of taking more complex topologies
>>> multi-package vs. multi-core vs. multi-thread into account. The scheduler
>>> isn't FreeBSD's biggest scaling problem, but it will definitely wear thin
>>> with time.
>>> 
>>>
>>>          -Kip
>> 
>> I am not sure 4BSD was designed for single queue, I only know
>> it is just a dynamic priority alogrithm for time-sharing process.
>
> Well you are right in that it can be retrofitted to be more multi-cpu aware. 
> Still I think it will be important on big system to get rid of the O(n) 
> traversal of all threads.
>
>> Within large system, you may have to split resource into some
>> regions and hierarchies, each region will have a load average,
>> (current we only have single load average), thread may always
>> have affinity with region resources, the region may have a global
>> run queue which for example is designed for real-time thread or
>> kernel thread which should be run as soon as possible,
>> user or less important thread can be on per-cpu queue.
>> Current, the problem in ULE and Linux scheduler is they don't
>> care priority balance, a CPU may have many high priority threads
>> while another CPU may only have few of them, while all alogrithm
>> are designed based on priority scheduling, this problem can
>
> In ULE we try to handle realtime threads on the cpu that dispatched them so 
> they get the lowest possible latency.  I have noticed this problem of 
> priority disparity myself.  Part of it was that with the two queue system it 
> was not as practical to try to balance based on priority because priority 
> meant less than queue position.  This is fixed in the new ULE. ULE does try 
> to move higher priority threads to inactive CPUs when it does migrate, but 
> there could be periods of short-term imbalance.  It's hard to solve these 
> without cache thrashing the run queues.
>
>> not be ignored,  I have a two CPUs priority balance algorithm based
>> on Solaris idea,the algorithm is also cache-affinity, I must say it have
>> best performance when running mysql super-smack, normally I can 8%
>> better performance on 2-cpu system, it is still 4BSD, I don't have 
>> algorithm
>> for 4 cpu or more yet. since we use blockable mutex and priority
>> propagation, I think priority is important, if spinlock is widely used,
>
> I agree with your statement that priority is very important and that ULE did 
> not honor it as well before.  Could you tell me what the solaris algorithm 
> is?  I looked at it once and remember thinking it was quite simple and 
> preferred to run the highest priority thread at all times, but I don't 
> remember exactly what the algorithm was.

I just looked it up.  The solaris algorithm prevents direct preemption if 
the priority is not in the kernel or realtime priority range.  Otherwise 
it simply sets NEEDRESCHED.  When assigning a thread to a cpu it checks to 
see if the last cpu it executed on is running a higher priority thread 
(lower for solaris where higher is better) and if so puts it there. 
Otherwise it finds the cpu which is executing the lowest priority thread 
and assigns it there.

This basically balances the cpus by priority rather than by load. 
Balancing by priority indirectly balances by load because threads which 
have slept for a long time will eventually have a higher priority.  This 
is pretty smart, although it ends up inspecting per-cpu state of remote 
cpus on almost every wakeup.  I would expect this to perform badly but 
perhaps the improvement in user-space speed outweighs the kernel time.

I'll investigate this furher.  One slight problem with this approach for 
ULE is that sitting on the run-queue for a long period of time improves 
your position on the run-queue but not directly your priority.  I think I 
can solve this though by resetting the priority as soon as you run which 
will take into account the ticks that you were waiting.

Anyway, thanks for the pointer Xu.  I may hack this up and compare it to 
ULE's current balancing.

Cheers,
Jeff

>
>> each spinlock looks like a critical region, then priority is less 
>> important,
>> Linux widely uses spinlock, it might be only of reason why it still works.
>
> I believe you are right here as well.
>
> Thanks,
> Jeff
>
>> 
>> Regards,
>> David Xu
>> _______________________________________________
>> freebsd-current at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>> 
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>