ULE vs. 4BSD in RELENG_7
Remy Nonnenmacher
remy.nonnenmacher at activnetworks.com
Fri Oct 19 03:05:18 PDT 2007
Josh Carroll wrote:
>> I have noticed some performance discrepancies with ULE and 4BSD in
>> RELENG_7, specifically with ffmpeg. I have all the kernel debugging
>> options disabled, and as I understand it, the userland debugging is
>> all off by default in RELENG_7.
>
> Here are a couple of additional benchmarks comparing the schedulers on
> my system:
>
> make -j8 -DNOCLEAN buildkernel
> 4BSD: 3:25.56
> ULE: 3:39.20
> Difference: -6.6 %
>
> ubench (CPU):
> 4BSD: 1705258
> ULE: 1713510
> Difference: +0.48 %
>
> super-smack (select-key 10 10000):
> 4BSD: 55044.38
> ULE: 68085.21
> Difference: +23.69 %
>
> super-smack (update-select 10 10000):
> 4BSD: 16734.15
> ULE: 17631.43
> Difference: +5.36 %
>
> So at least for the MySQL super-smack benchmark (I know it's a rather
> contrived benchmark), ULE is significantly faster for select-key and a
> decent improvement for update-select. ubench is about the same, but
> building a kernel is also slower with ULE.
>
Here are some buildworld measures (extract from a buildaton):
i386: 6.2-RELEASE
-j4: 746.08 real 1996.38 user 468.91 sys 1535.1 RSA
-j6: 595.31 real 1957.31 user 539.24 sys 2304.9 RSA
-j8: 534.21 real 1957.76 user 567.06 sys 3068.5 RSA
M1000: 492.64 real 1956.58 user 587.41 sys
100: 526.22 real 1936.98 user 559.49 sys 3073.8 RSA
M100: 474.26 real 1947.09 user 563.95 sys
-j10: 550.18 real 1975.54 user 588.33 sys
-j12: 550.23 real 1976.88 user 602.65 sys
-j16: 559.22 real 1972.19 user 634.29 sys
i386: 7.0-current (as of 10/16/2007) - SCHED_4BSD
-j4: 1072.64 real 2880.29 user 561.13 sys 1495.7 RSA
-j6: 842.91 real 2813.75 user 638.91 sys 2244.8 RSA
-j8: 758.48 real 2824.23 user 704.00 sys 2990.1 RSA
M1000: 696.12 real 2820.53 user 706.97 sys
100: 752.58 real 2809.97 user 685.35 sys 2993.2
RSA
M100: 666.58 real 2804.72 user 714.01 sys
-j10: 763.82 real 2843.44 user 743.77 sys
-j12: 785.12 real 2845.11 user 770.31 sys
-j16: 805.02 real 2848.06 user 819.53 sys
i386: 7.0-current (as of 10/16/2007) - SCHED_ULE
-j4: 1047.00 real 2857.59 user 486.93 sys 1494.2 RSA
-j6: 831.10 real 2793.94 user 524.58 sys 2242.6 RSA
-j8: 803.34 real 2796.46 user 552.56 sys 2991.0 RSA
M1000: 709.77 real 2793.20 user 572.27 sys
100: 785.18 real 2765.14 user 545.57 sys 2991.4 RSA
M100: 707.09 real 2769.88 user 572.92 sys
-j10: 813.36 real 2808.13 user 587.51 sys
-j12: 824.23 real 2817.60 user 618.00 sys
-j16: 856.11 real 2847.68 user 721.97 sys
---------
Conditions:
Machine: Intel SR2520 - S5000PAL, 2xE5345 (8 cores, 2.33Ghz), 8G, 1xsata)
Generic kernel; /usr/obj/* removed after each run; Runs done after a
reboot; softupdates on all fs; RSA = openssl speed -multi <N> rsa1024;
M1000 = noatime /usr, kern.hz=1000, tar cf /dev/null /usr/src before run.
M100 = idem with kern.hz=100
100 = generic test with kern.hz=100 (To be compared with -j8 line).
(M variants to minimize disk contention reading src; Variants measured
only on #-core sweetspot (8 here)).
---------
Remarks:
- There seems to be a loss of efficiency on openssl code. Not scheduler
related. An indication of compiler change ?
- 6.2 results only for information. RSA left aside, there is no direct
equivalence between buildworld workload nor duration/level of
parallelism between 6.2 and 7.0. Also, Amdhal's law limits efficiency on
increasing cores number.
- ULE tends to be more efficient than 4BSD when there are available
cores (1.0244 and 1.0142 ratio on -j4 and -j6) but less efficient as
load increase (0.9807 to 0.9403 from -j8 to -j16).
- ULE seems to be less sensitive to Hz than 4BSD (1.0037 from 1.0443 on
M1000/M100 variants ratio). (Beware of side effects on time/delay
bandwidth estimator at network level).
--
RN.
IeM
More information about the freebsd-stable
mailing list