optimization of the system by recompilation

Alexander Portnoy my-subs at mail.ru
Sun Apr 11 07:55:13 PDT 2004


On Sun, 11 Apr 2004 01:29:22 +0200
Patrick Proniewski <patpro at patpro.net> wrote:

> Hello,
> 
> I wonder if tuning the make.conf and the kernel to take into account 
> specific CPU features/options does really make a difference in terms of 
> performance on FreeBSD.
> I've googled for benches but can't find any interesting papers.
> Any info greatly appreciated.
> 
> patpro
> -- 
> je cherche un poste d'admin-sys Mac/UNIX
> (ou une jeune et jolie femme riche)
> http://patpro.net/cv.php
> 

Yes, It make a difference. But the difference is not always positive.

I tested two things: 'make buildworld' and openssl under FreeBSD-5.2.1-RELEASE
installed on notebook "HP Pavilion ze4523ea" (AMD Athlon-XP Mobile 2400+ 1.8GHz
512KB L2 cache, 256MB DDR266 RAM, ALi chipset based MB, Seagate ST93012A HDD)
and desktop system "Compaq Evo D310" (Pentium-IV 2400MHz 512KB L2 cache 533MHz
FSB w/o HTT, 256MB DDR266 RAM, Intel-845 chipset based MB, Maxtor-2F040L0 HDD).
The Pentium-IV-based system presented for rough comparison "AMD vs Intel" only.
So, when I talk about "two systems", I mean two Athlon-based installations on
the HP notebook: optimized and not-optimized ones.

On the notebook I executed 'make buildworld' twice: the first time in absolutely
clean system with GENERIC kernel and binary-installed world, and the second time
in recompiled system and kernel. Each test performed without make.conf in single-user
mode. In the second case the installed kernel have support of SSE and AMD processors'
specific instructions and the world was built with the following options:

CPUTYPE?=athlon-xp
CFLAGS= -O -pipe
CXXFLAGS+= -fmemoize-lookups -fsave-memoized
COPTFLAGS= -O -pipe
WANT_FORCE_OPTIMIZATION_DOWNGRADE=1
MAKE_IDEA=      YES

Of course, the optons used for compiling the test system, but not in the tests.

So, we have two systems (optimized and not) running on same hardware and executing 
same task. The results are as follows:

make buildworld on not-optimized Athlon-based system:  real:2730.52  user:1980.90  system:459.79
make buildworld on     optimized Athlon-based system:  real:2840.27  user:2074.22  system:456.88

As You can see, this specific task runs slower as result of the optimization.


On Pentium-IV with generic binary installation I got these results:

make buildworld on not-optimized P-IV system:  real:2591.67  user:1855.77  system:512.19

Currently, this system runs recompiled optimized (in the same manner as the Athlon-based
system, but using according CPU type) world and kernel, but I not tested the
effect of the optimization on the 'make buildworld'. I'll do It in three days.

================================================================================

There are the results of 'openssl speed' test:

--------------------------------------------------------------------------------

The not-optimized Athlon-based system:

OpenSSL 0.9.7c 30 Sep 2003
built on: Mon Feb 23 18:18:12 GMT 2004
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx) 
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2               1353.84k     2883.85k     4037.49k     4498.69k     4637.52k
mdc2              3099.95k     3513.05k     3630.29k     3667.98k     3668.41k
md4              11665.58k    41273.75k   121422.56k   236540.37k   328188.46k
md5               9178.08k    30801.73k    83847.38k   148018.63k   191651.13k
hmac(md5)         5219.81k    18714.10k    58407.60k   124940.65k   187243.86k
sha1              9021.76k    27620.49k    65920.32k   100429.75k   118665.95k
rmd160            7595.82k    21845.25k    46756.02k    66160.23k    75117.55k
rc4              91594.44k    97533.81k    99011.52k    99675.56k    99841.45k
des cbc          18603.86k    19156.64k    19291.56k    19336.36k    19348.79k
des ede3          6752.80k     6823.83k     6847.10k     6852.97k     6854.59k
idea cbc             0.00         0.00         0.00         0.00         0.00 
rc2 cbc          16364.82k    16954.69k    17169.81k    17204.85k    17165.59k
rc5-32/12 cbc    76003.59k    91265.62k    95365.92k    96674.31k    97055.51k
blowfish cbc     45465.01k    50084.68k    51324.54k    51718.48k    51965.94k
cast cbc         31093.46k    33235.03k    33800.51k    33974.97k    34024.42k
aes-128 cbc      38303.47k    39168.76k    39528.18k    39782.53k    39713.81k
aes-192 cbc      33541.45k    34119.22k    34379.99k    34494.11k    34610.47k
aes-256 cbc      29583.16k    30149.21k    30502.23k    30512.58k    30532.75k
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0014s   0.0001s    725.1   8380.1
rsa 1024 bits   0.0072s   0.0004s    138.0   2743.8
rsa 2048 bits   0.0437s   0.0012s     22.9    826.9
rsa 4096 bits   0.2889s   0.0041s      3.5    242.7
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0012s   0.0015s    847.8    688.5
dsa 1024 bits   0.0036s   0.0045s    275.7    223.2
dsa 2048 bits   0.0119s   0.0146s     84.4     68.7

--------------------------------------------------------------------------------

The optimized Athlon-based system:

OpenSSL 0.9.7c-p1 30 Sep 2003
built on: Sat Apr  3 19:27:44 IST 2004
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx) 
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2               1469.85k     3141.10k     4364.08k     4845.00k     5018.59k
mdc2              3422.22k     3961.57k     4106.14k     4148.13k     4168.24k
md4              13293.53k    46403.23k   133057.80k   247860.40k   332391.22k
md5               9923.45k    32653.90k    87393.52k   150725.62k   191716.54k
hmac(md5)         5886.66k    21173.88k    64159.59k   131836.68k   188889.57k
sha1             10234.21k    30260.14k    69060.28k   100388.71k   115805.02k
rmd160            8126.79k    22809.78k    48072.18k    66651.00k    75161.24k
rc4              90439.87k    95741.84k    97001.42k    97528.56k    97796.82k
des cbc          18663.73k    19158.23k    19290.86k    19385.42k    19346.18k
des ede3          6763.25k     6833.31k     6856.65k     6862.52k     6863.96k
idea cbc         25537.24k    26803.52k    27140.75k    27241.67k    27318.54k
rc2 cbc          15347.39k    15948.06k    16081.37k    16116.49k    16115.75k
rc5-32/12 cbc    76203.31k    90366.69k    94297.41k    95552.81k    95914.76k
blowfish cbc     45606.12k    50249.45k    51453.63k    51857.83k    51971.32k
cast cbc         31290.53k    33480.59k    34130.29k    34222.75k    34260.36k
aes-128 cbc      39364.21k    39919.03k    40532.64k    40556.81k    40592.93k
aes-192 cbc      34185.24k    34811.82k    35197.58k    35295.92k    35323.30k
aes-256 cbc      30500.45k    30863.06k    31166.10k    31243.22k    31264.33k
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0011s   0.0001s    870.6   9641.5
rsa 1024 bits   0.0062s   0.0003s    161.3   3127.8
rsa 2048 bits   0.0384s   0.0011s     26.1    935.0
rsa 4096 bits   0.2530s   0.0037s      4.0    272.4
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0010s   0.0012s    993.3    852.1
dsa 1024 bits   0.0031s   0.0037s    319.1    267.1
dsa 2048 bits   0.0103s   0.0124s     97.2     80.5

--------------------------------------------------------------------------------

Optimized Pentium-IV-based system:

OpenSSL 0.9.7c-p1 30 Sep 2003
built on: Fri Apr  2 08:27:27 IST 2004
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx) 
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2               1726.48k     3776.53k     5370.03k     6055.36k     6271.25k
mdc2              3273.95k     3776.35k     3931.20k     3968.70k     3991.80k
md4              11715.58k    38997.36k   103675.62k   178255.61k   238275.60k
md5               8934.14k    28045.88k    69794.77k   106627.55k   124504.85k
hmac(md5)         4645.41k    15848.47k    47652.73k    90719.45k   121526.49k
sha1              8252.23k    23922.96k    52550.74k    74687.05k    85330.16k
rmd160            6495.77k    17329.79k    34379.13k    45735.61k    50501.56k
rc4              77374.43k    85999.31k    88387.11k    89063.26k    88959.69k
des cbc          12892.70k    13269.26k    13355.82k    13448.57k    13487.01k
des ede3          4735.53k     4773.85k     4785.02k     4799.68k     4794.03k
idea cbc         13666.19k    14120.88k    14239.32k    14305.44k    14278.27k
rc2 cbc          11083.37k    11796.76k    11756.10k    11896.34k    11602.77k
rc5-32/12 cbc    59411.37k    64298.25k    65320.48k    66806.59k    66736.33k
blowfish cbc     39247.42k    42614.92k    43648.10k    43835.60k    43885.58k
cast cbc         25203.76k    26688.89k    27010.57k    27590.74k    27322.58k
aes-128 cbc      65918.18k    66334.91k    68613.90k    70164.26k    69141.19k
aes-192 cbc      56768.29k    58742.12k    61008.16k    61665.29k    60882.01k
aes-256 cbc      52524.87k    53551.62k    54792.55k    54951.13k    54106.28k
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0015s   0.0001s    667.3   7133.5
rsa 1024 bits   0.0084s   0.0004s    119.5   2324.9
rsa 2048 bits   0.0514s   0.0015s     19.5    679.1
rsa 4096 bits   0.3451s   0.0052s      2.9    193.5
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0013s   0.0016s    752.3    632.4
dsa 1024 bits   0.0042s   0.0050s    238.6    199.0
dsa 2048 bits   0.0142s   0.0172s     70.2     58.3


--------------------------------------------------------------------------------

In openssl the results are not so bad: some algorithms runs faster and another ones
slower in the optimized system. Take a look at the difference between performance of
Atlon and Pentium on "idea cbc" test. Another example is "aes-256 cbc".

So, as You can see, there is not a simple answer to Your question in the common case.
If You interested in high performance in some specific task - You must test the task
on different hardware platforms with different optimizations (or without ones ;-)




More information about the freebsd-performance mailing list