FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang
Mehmet Erol Sanliturk
m.e.sanliturk at gmail.com
Sat Mar 12 12:43:10 UTC 2011
2011/3/12 Martin Matuska <mm at freebsd.org>
> Hi Poul-Henning,
> I have redone the test for majority of the processors, this time taking
> 5 samples of each whole testrun, calculating the average, standard
> deviation, relative standard deviation, standard error and relative
> standard error.
> The relative standard error is below 0.25% for ~91%, between 0.25% and
> 0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for <1% of the
> Under a "test" I mean 5 runs for the same setting of the same
> compiler on the same processor.
To have VALID test results , it is NECESSARY to obtain the results by using
DIFFERENT computers .
( This point is NOT mentioned in your message . I am assuming that the SAME
computer is used to get the results . )
If you repeat the same computations on the SAME computer , the values are
CORRELATED , and the t test
is NOT valid , because you are computing mean and standard deviation of
CORRELATED values , where the correlation is introduced by the SAME
To obtain a proper test values set , you may use the following set up :
( CLang and GCC versions , compilation parameters will be the same in all of
the computers )
Computer 1 v(1,1) v(1,2)
Computer 2 v(2,1) v(2,2)
Computer n v(n,1) v(n,2)
If you do NOT have so many computers , you may obtain test results from
other reliable sources by using the same compilation parameters .
Now it is possible to use t-test on PAIRED values .
To determine the sample size , it is necessary to make power computations
BEFORE execution of experiment by specifying required values a priori .
If you want to compare ( Clang Version x ) ... ( Clang Version y ) ( GCC
Version x ) ... ( GCC version y ) ... etc.
as MORE than TWO compilers at the same time , it is necessary to use
MULTIPLE COMPARISONS .
Using two-by-two t-tests as isolated from the rest of the results (
variables as compilers ) will give distorted results unless differences are
significant at the 0.001 level ( where actual significance level will be
greater than 0.001 , but very likely that less than 0.05 ) .
Such computations ( paired t-test , power , multiple comparisons and others
) are available in R statistical package which is in the Ports .
It is my opinion that using different processor models with approximate
speeds will not distort results very much . Personally I prefer such a
different processors set up . In this set up it will be possible to test
performance of the compilers on a mixture of processors ( likely as
independent from processor model ) .
Thank you very much .
Mehmet Erol Sanliturk
More information about the freebsd-performance