Compiler performance tests on FreeBSD 10.0-CURRENT
Dimitry Andric
dimitry at andric.com
Sat Sep 15 22:34:49 UTC 2012
Hi all,
By request, I performed a series of kernel performance tests on FreeBSD
10.0-CURRENT, particularly comparing the runtime performance of GENERIC
kernels compiled by gcc 4.2.1 and by clang 3.2.
The attached text file[1] contains more information about the tests,
some semi-cooked performance data, and my conclusions. Any errors and
omissions are also my fault, so if you notice them, please let me know.
The executive summary: GENERIC kernels compiled with clang 3.2 are
slightly faster than those compiled by gcc 4.2.1, though the difference
will not very noticeable in practice.
Last but not least, thanks to Gavin Atkinson for providing the required
hardware.
-Dimitry
[1]: Also available at:
<http://www.andric.com/freebsd/perftest/perftest-kernel-2012-09-14a.txt>
-------------- next part --------------
KERNEL PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012
================================================================
INTRODUCTION
------------
These tests aim to give an indication of the runtime performance of FreeBSD
kernels compiled with different compilers. The compilers tested were:
- gcc 4.2.1, the system compiler in FreeBSD.
- clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD
10.0-CURRENT, after r239462.
All tests were run on a machine gracefully provided by Gavin Atkinson, which is
a Dell PowerEdge 2850, with two 2.80 GHz Xeon-class CPUs (id=0xf41), and 4 GB
RAM. It runs FreeBSD/amd64 10.0-CURRENT as of Tue Sep 11 19:11:00 UTC 2012.
With each compiler, a stock GENERIC kernel for amd64 was built from head as of
r240384, with the default optimization flags for this architecture, which are
for gcc:
-O2 -frename-registers -pipe -fno-strict-aliasing
and for clang:
-O2 -pipe -fno-strict-aliasing
Each kernel was installed into /boot/kernel.${compilername}. The system was
then booted with each of these kernels, without modifying anything else, and
multiple runs of "make buildworld" were done; first in single-threaded mode,
next in multi-threaded mode, using the -j8 flag. Between each run, the /usr/obj
directory was fully cleaned out, and filesystems were synced.
The timing results are below.
Building world, single-threaded, on a GENERIC kernel compiled by clang 3.2
--------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 3 26589.27 26680.48 26653.58 26641.11 46.866211
user 3 20449.52 20472.88 20463.4 20461.933 11.748861
sys 3 7809.87 7837.94 7830.35 7826.0533 14.519891
maxrss 3 759420 759420 759420 759420 0
ixrss 3 4923 4926 4924 4924.3333 1.5275252
idrss 3 584 584 584 584 0
isrss 3 131 131 131 131 0
minflt 3 6.5828088e+08 6.5855089e+08 6.5828258e+08 6.5837145e+08 155402.8
majflt 3 0 2573 2568 1713.6667 1484.081
nswap 3 0 0 0 0 0
inblock 3 2176 30252 30170 20866 16186.067
oublock 3 28370 28377 28375 28374 3.6055513
msgsnd 3 0 5 2 2.3333333 2.5166115
msgrcv 3 0 3 2 1.6666667 1.5275252
nsignals 3 74107 74107 74107 74107 0
nvcsw 3 1086164 1107104 1106650 1099972.7 11960.81
nivcsw 3 604641 658906 616307 626618 28564.14
Building world, single-threaded, on a GENERIC kernel compiled by gcc 4.2.1
--------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 3 26986.71 27080.38 26992.54 27019.877 52.478445
user 3 20506.89 20516.1 20511.66 20511.55 4.6059852
sys 3 8245.69 8285.79 8253.04 8261.5067 21.348673
maxrss 3 759420 759420 759420 759420 0
ixrss 3 4894 4900 4898 4897.3333 3.0550505
idrss 3 581 581 581 581 0
isrss 3 131 131 131 131 0
minflt 3 6.5855245e+08 6.5855409e+08 6.5855253e+08 6.5855302e+08 922.2581
majflt 3 0 2566 0 855.33333 1481.4808
nswap 3 0 0 0 0 0
inblock 3 1619 29805 2008 11144 16162.07
oublock 3 28652 28747 28662 28687 52.201533
msgsnd 3 0 2 0 0.66666667 1.1547005
msgrcv 3 0 2 0 0.66666667 1.1547005
nsignals 3 74107 74107 74107 74107 0
nvcsw 3 1088827 1110096 1089758 1096227 12019.924
nivcsw 3 631463 668779 638421 646221 19843.159
Summary:
--------
On a kernel compiled with gcc 4.2.1, building world in single-threaded mode is
~1.4% slower in real time than on a kernel compiled with clang 3.2, equally fast
in user time, and ~5.6% slower in system time.
Conclusion:
-----------
The difference in real time is rather minimal, and even negligible in user time,
but in system time it is much more pronounced.
Since system time can be attributed to the kernel proper, a kernel compiled with
clang 3.2 is clearly faster than a kernel compiled with gcc 4.2.1, by a margin
of just over 5 percent.
Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2
-------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 3 13832.75 13875.24 13871.47 13859.82 23.518969
user 3 33658.54 33743.43 33730.26 33710.743 45.686467
sys 3 14704.76 14775.59 14744.45 14741.6 35.500903
maxrss 3 758256 758256 758256 758256 0
ixrss 3 4829 4831 4830 4830 1
idrss 3 573 574 574 573.66667 0.57735027
isrss 3 130 130 130 130 0
minflt 3 6.6259374e+08 6.6304066e+08 6.6288552e+08 6.6283997e+08 226911.43
majflt 3 3160 4003 3801 3654.6667 440.13899
nswap 3 40 40 40 40 0
inblock 3 27763 28008 27853 27874.667 123.92874
oublock 3 55003 58725 57061 56929.667 1864.4724
msgsnd 3 0 0 0 0 0
msgrcv 3 0 0 0 0 0
nsignals 3 60496 60506 60499 60500.333 5.1316014
nvcsw 3 1891074 1894870 1893148 1893030.7 1900.7181
nivcsw 3 3095468 3126475 3116877 3112940 15873.988
Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1
-------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 3 14017.65 14046.35 14042.26 14035.42 15.524552
user 3 33596.19 33687.03 33661.9 33648.373 46.906337
sys 3 15347.75 15438.63 15436.98 15407.787 51.999823
maxrss 3 758228 758248 758244 758240 10.583005
ixrss 3 4808 4809 4809 4808.6667 0.57735027
idrss 3 571 571 571 571 0
isrss 3 130 130 130 130 0
minflt 3 6.6301232e+08 6.6339175e+08 6.6312437e+08 6.6317615e+08 194941.64
majflt 3 3715 5509 3812 4345.3333 1008.9313
nswap 3 40 40 40 40 0
inblock 3 28327 43672 28374 33457.667 8845.9034
oublock 3 50661 57892 56870 55141 3913.3005
msgsnd 3 0 0 0 0 0
msgrcv 3 0 0 0 0 0
nsignals 3 60501 60506 60504 60503.667 2.5166114
nvcsw 3 1882397 1910610 1895448 1896151.7 14119.657
nivcsw 3 2747620 2856552 2788778 2797650 55005.267
Summary:
--------
On a kernel compiled with gcc 4.2.1, building world in multi-threaded mode is
~1.3% slower in real time than on a kernel compiled with clang 3.2, equally fast
in user time, and ~4.5% slower in system time.
Conclusion:
-----------
As with single-threaded mode, the difference in real time is rather minimal, and
even negligible in user time, but in system time it is much more pronounced.
Since system time can be attributed to the kernel proper, a kernel compiled with
clang 3.2 is clearly faster than a kernel compiled with gcc 4.2.1, by a margin
of just over 4 percent.
================================================================================
Copyright (c) 2012 Dimitry Andric <dimitry at andric.com>
Verbatim copying and redistribution of this entire text are permitted, provided
this notice is preserved.
================================================================================
More information about the freebsd-current
mailing list