ULE vs. 4BSD in RELENG_7

Remy Nonnenmacher remy.nonnenmacher at activnetworks.com
Fri Oct 19 03:05:18 PDT 2007


Josh Carroll wrote:
>> I have noticed some performance discrepancies with ULE and 4BSD in
>> RELENG_7, specifically with ffmpeg. I have all the kernel debugging
>> options disabled, and as I understand it, the userland debugging is
>> all off by default in RELENG_7.
> 
> Here are a couple of additional benchmarks comparing the schedulers on
> my system:
> 
> make -j8 -DNOCLEAN buildkernel
> 4BSD: 3:25.56
> ULE: 3:39.20
> Difference: -6.6 %
> 
> ubench (CPU):
> 4BSD: 1705258
> ULE: 1713510
> Difference: +0.48 %
> 
> super-smack (select-key 10 10000):
> 4BSD: 55044.38
> ULE: 68085.21
> Difference: +23.69 %
> 
> super-smack (update-select 10 10000):
> 4BSD: 16734.15
> ULE: 17631.43
> Difference: +5.36 %
> 
> So at least for the MySQL super-smack benchmark (I know it's a rather
> contrived benchmark), ULE is significantly faster for select-key and a
> decent improvement for update-select. ubench is about the same, but
> building a kernel is also slower with ULE.
> 

Here are some buildworld measures (extract from a buildaton):

i386: 6.2-RELEASE

-j4:    746.08 real      1996.38 user       468.91 sys        1535.1 RSA
-j6:    595.31 real      1957.31 user       539.24 sys        2304.9 RSA
-j8:    534.21 real      1957.76 user       567.06 sys        3068.5 RSA
  M1000: 492.64 real      1956.58 user       587.41 sys
  100:   526.22 real      1936.98 user       559.49 sys        3073.8 RSA
  M100:  474.26 real      1947.09 user       563.95 sys
-j10:   550.18 real      1975.54 user       588.33 sys
-j12:   550.23 real      1976.88 user       602.65 sys
-j16:   559.22 real      1972.19 user       634.29 sys


i386: 7.0-current (as of 10/16/2007) - SCHED_4BSD

-j4:   1072.64 real      2880.29 user       561.13 sys        1495.7 RSA
-j6:    842.91 real      2813.75 user       638.91 sys        2244.8 RSA
-j8:    758.48 real      2824.23 user       704.00 sys        2990.1 RSA
  M1000: 696.12 real      2820.53 user       706.97 sys
  100:   752.58 real      2809.97 user       685.35 sys        2993.2 
RSA
  M100:  666.58 real      2804.72 user       714.01 sys
-j10:   763.82 real      2843.44 user       743.77 sys
-j12:   785.12 real      2845.11 user       770.31 sys
-j16:   805.02 real      2848.06 user       819.53 sys

i386: 7.0-current (as of 10/16/2007) - SCHED_ULE

-j4:   1047.00 real      2857.59 user       486.93 sys        1494.2 RSA
-j6:    831.10 real      2793.94 user       524.58 sys        2242.6 RSA
-j8:    803.34 real      2796.46 user       552.56 sys        2991.0 RSA
  M1000: 709.77 real      2793.20 user       572.27 sys
  100:   785.18 real      2765.14 user       545.57 sys        2991.4 RSA
  M100:  707.09 real      2769.88 user       572.92 sys
-j10:   813.36 real      2808.13 user       587.51 sys
-j12:   824.23 real      2817.60 user       618.00 sys
-j16:   856.11 real      2847.68 user       721.97 sys

---------
Conditions:

Machine: Intel SR2520 - S5000PAL, 2xE5345 (8 cores, 2.33Ghz), 8G, 1xsata)

Generic kernel; /usr/obj/* removed after each run; Runs done after a 
reboot; softupdates on all fs; RSA = openssl speed -multi <N> rsa1024;

M1000 = noatime /usr, kern.hz=1000, tar cf /dev/null /usr/src before run.
M100 = idem with kern.hz=100
100  = generic test with kern.hz=100 (To be compared with -j8 line).

(M variants to minimize disk contention reading src; Variants measured 
only on #-core sweetspot (8 here)).

---------

Remarks:

- There seems to be a loss of efficiency on openssl code. Not scheduler 
related. An indication of compiler change ?

- 6.2 results only for information. RSA left aside, there is no direct 
equivalence between buildworld workload nor duration/level of 
parallelism between 6.2 and 7.0. Also, Amdhal's law limits efficiency on 
increasing cores number.

- ULE tends to be more efficient than 4BSD when there are available 
cores (1.0244 and 1.0142 ratio on -j4 and -j6) but less efficient as 
load increase (0.9807 to 0.9403 from -j8 to -j16).

- ULE seems to be less sensitive to Hz than 4BSD (1.0037 from 1.0443 on 
M1000/M100 variants ratio). (Beware of side effects on time/delay 
bandwidth estimator at network level).

--
RN.
IeM





More information about the freebsd-stable mailing list