ULE vs. 4BSD in RELENG_7
jroberson at chesapeake.net
Wed Oct 24 11:19:29 PDT 2007
On Tue, 23 Oct 2007, Josh Carroll wrote:
> I posted this to the stable mailing list, as I thought it was
> pertinent there, but I think it will get better attention here. So I
> apologize in advance for cross-posting if this is a faux pas. :)
> Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two
> workloads that I am sensitive to: building world with -j X, and ffmpeg
> -threads X. Other benchmarks seem to indicate relatively equal
> performance between the two. MySQL, on the other hand, is
> significantly faster in ULE.
> I'm trying to understand why ffmpeg and buildworld are slower in ULE
> than 4BSD, since it seems to me that ULE was supposed to be the better
> scaling scheduler.
> Here is a link to the original thread on the stable mailing list:
> Remy replied with some interesting results for building world between
> the two schedulers on an 8-way system. It seems that ULE suffers as
> more threads/processes are thrown at it, at least it appears that way
> from Remy's data.
> Does anyone have any additional performance tests I can run that might
> help indicate where the deficiency is in the ULE scheduler? MySQL
> performance is excellent, so I'm wondering if it was tuned to that
> particular workload?
> I'm not sure if Remy subscribes to this list, so I am CC'ing him. Hope
> you don't mind Remy :)
Thanks for your emails. First, as gnn mentioned, I'm without most of my
things at the moment. I have some patches which might improve your
workload but I need to test and tune them more myself before I give them
out. I doubt any of the sysctls other than steal_thresh and balance_ticks
will help your situation.
ULE is tuned for workloads that benefit from improved affinity. Not
necessarily mysql in particular. Tests with other workloads that benefit
from improved affinity have verified that it's not really mysql specific
tuning. The problem with buildworld for ULE is that 4BSD gets basically
perfect affinity and perfect balancing because the compiler runs,
typically uninterrupted, until it does a blocking disk transaction. At
that point it doesn't matter which CPU it is resumed on.
Your tests with ffmpeg threads vs processes probably is triggering more
context switches due to lock contention in the kernel in the threads case.
This is also likely the problem with some super-smack tests. On each
context switch 4BSD has an opportunity to perfectly balance the CPUs. ULE
does not because it's too costly and hinders other workloads.
I don't doubt that we can improve things further. It will just have to
wait for another few weeks before I'm able to do much about it.
> freebsd-performance at freebsd.org mailing list
> To unsubscribe, send any mail to "freebsd-performance-unsubscribe at freebsd.org"
More information about the freebsd-performance