optimization of the system by recompilation
Alexander Portnoy
my-subs at mail.ru
Sun Apr 11 07:55:13 PDT 2004
On Sun, 11 Apr 2004 01:29:22 +0200
Patrick Proniewski <patpro at patpro.net> wrote:
> Hello,
>
> I wonder if tuning the make.conf and the kernel to take into account
> specific CPU features/options does really make a difference in terms of
> performance on FreeBSD.
> I've googled for benches but can't find any interesting papers.
> Any info greatly appreciated.
>
> patpro
> --
> je cherche un poste d'admin-sys Mac/UNIX
> (ou une jeune et jolie femme riche)
> http://patpro.net/cv.php
>
Yes, It make a difference. But the difference is not always positive.
I tested two things: 'make buildworld' and openssl under FreeBSD-5.2.1-RELEASE
installed on notebook "HP Pavilion ze4523ea" (AMD Athlon-XP Mobile 2400+ 1.8GHz
512KB L2 cache, 256MB DDR266 RAM, ALi chipset based MB, Seagate ST93012A HDD)
and desktop system "Compaq Evo D310" (Pentium-IV 2400MHz 512KB L2 cache 533MHz
FSB w/o HTT, 256MB DDR266 RAM, Intel-845 chipset based MB, Maxtor-2F040L0 HDD).
The Pentium-IV-based system presented for rough comparison "AMD vs Intel" only.
So, when I talk about "two systems", I mean two Athlon-based installations on
the HP notebook: optimized and not-optimized ones.
On the notebook I executed 'make buildworld' twice: the first time in absolutely
clean system with GENERIC kernel and binary-installed world, and the second time
in recompiled system and kernel. Each test performed without make.conf in single-user
mode. In the second case the installed kernel have support of SSE and AMD processors'
specific instructions and the world was built with the following options:
CPUTYPE?=athlon-xp
CFLAGS= -O -pipe
CXXFLAGS+= -fmemoize-lookups -fsave-memoized
COPTFLAGS= -O -pipe
WANT_FORCE_OPTIMIZATION_DOWNGRADE=1
MAKE_IDEA= YES
Of course, the optons used for compiling the test system, but not in the tests.
So, we have two systems (optimized and not) running on same hardware and executing
same task. The results are as follows:
make buildworld on not-optimized Athlon-based system: real:2730.52 user:1980.90 system:459.79
make buildworld on optimized Athlon-based system: real:2840.27 user:2074.22 system:456.88
As You can see, this specific task runs slower as result of the optimization.
On Pentium-IV with generic binary installation I got these results:
make buildworld on not-optimized P-IV system: real:2591.67 user:1855.77 system:512.19
Currently, this system runs recompiled optimized (in the same manner as the Athlon-based
system, but using according CPU type) world and kernel, but I not tested the
effect of the optimization on the 'make buildworld'. I'll do It in three days.
================================================================================
There are the results of 'openssl speed' test:
--------------------------------------------------------------------------------
The not-optimized Athlon-based system:
OpenSSL 0.9.7c 30 Sep 2003
built on: Mon Feb 23 18:18:12 GMT 2004
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx)
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 1353.84k 2883.85k 4037.49k 4498.69k 4637.52k
mdc2 3099.95k 3513.05k 3630.29k 3667.98k 3668.41k
md4 11665.58k 41273.75k 121422.56k 236540.37k 328188.46k
md5 9178.08k 30801.73k 83847.38k 148018.63k 191651.13k
hmac(md5) 5219.81k 18714.10k 58407.60k 124940.65k 187243.86k
sha1 9021.76k 27620.49k 65920.32k 100429.75k 118665.95k
rmd160 7595.82k 21845.25k 46756.02k 66160.23k 75117.55k
rc4 91594.44k 97533.81k 99011.52k 99675.56k 99841.45k
des cbc 18603.86k 19156.64k 19291.56k 19336.36k 19348.79k
des ede3 6752.80k 6823.83k 6847.10k 6852.97k 6854.59k
idea cbc 0.00 0.00 0.00 0.00 0.00
rc2 cbc 16364.82k 16954.69k 17169.81k 17204.85k 17165.59k
rc5-32/12 cbc 76003.59k 91265.62k 95365.92k 96674.31k 97055.51k
blowfish cbc 45465.01k 50084.68k 51324.54k 51718.48k 51965.94k
cast cbc 31093.46k 33235.03k 33800.51k 33974.97k 34024.42k
aes-128 cbc 38303.47k 39168.76k 39528.18k 39782.53k 39713.81k
aes-192 cbc 33541.45k 34119.22k 34379.99k 34494.11k 34610.47k
aes-256 cbc 29583.16k 30149.21k 30502.23k 30512.58k 30532.75k
sign verify sign/s verify/s
rsa 512 bits 0.0014s 0.0001s 725.1 8380.1
rsa 1024 bits 0.0072s 0.0004s 138.0 2743.8
rsa 2048 bits 0.0437s 0.0012s 22.9 826.9
rsa 4096 bits 0.2889s 0.0041s 3.5 242.7
sign verify sign/s verify/s
dsa 512 bits 0.0012s 0.0015s 847.8 688.5
dsa 1024 bits 0.0036s 0.0045s 275.7 223.2
dsa 2048 bits 0.0119s 0.0146s 84.4 68.7
--------------------------------------------------------------------------------
The optimized Athlon-based system:
OpenSSL 0.9.7c-p1 30 Sep 2003
built on: Sat Apr 3 19:27:44 IST 2004
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx)
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 1469.85k 3141.10k 4364.08k 4845.00k 5018.59k
mdc2 3422.22k 3961.57k 4106.14k 4148.13k 4168.24k
md4 13293.53k 46403.23k 133057.80k 247860.40k 332391.22k
md5 9923.45k 32653.90k 87393.52k 150725.62k 191716.54k
hmac(md5) 5886.66k 21173.88k 64159.59k 131836.68k 188889.57k
sha1 10234.21k 30260.14k 69060.28k 100388.71k 115805.02k
rmd160 8126.79k 22809.78k 48072.18k 66651.00k 75161.24k
rc4 90439.87k 95741.84k 97001.42k 97528.56k 97796.82k
des cbc 18663.73k 19158.23k 19290.86k 19385.42k 19346.18k
des ede3 6763.25k 6833.31k 6856.65k 6862.52k 6863.96k
idea cbc 25537.24k 26803.52k 27140.75k 27241.67k 27318.54k
rc2 cbc 15347.39k 15948.06k 16081.37k 16116.49k 16115.75k
rc5-32/12 cbc 76203.31k 90366.69k 94297.41k 95552.81k 95914.76k
blowfish cbc 45606.12k 50249.45k 51453.63k 51857.83k 51971.32k
cast cbc 31290.53k 33480.59k 34130.29k 34222.75k 34260.36k
aes-128 cbc 39364.21k 39919.03k 40532.64k 40556.81k 40592.93k
aes-192 cbc 34185.24k 34811.82k 35197.58k 35295.92k 35323.30k
aes-256 cbc 30500.45k 30863.06k 31166.10k 31243.22k 31264.33k
sign verify sign/s verify/s
rsa 512 bits 0.0011s 0.0001s 870.6 9641.5
rsa 1024 bits 0.0062s 0.0003s 161.3 3127.8
rsa 2048 bits 0.0384s 0.0011s 26.1 935.0
rsa 4096 bits 0.2530s 0.0037s 4.0 272.4
sign verify sign/s verify/s
dsa 512 bits 0.0010s 0.0012s 993.3 852.1
dsa 1024 bits 0.0031s 0.0037s 319.1 267.1
dsa 2048 bits 0.0103s 0.0124s 97.2 80.5
--------------------------------------------------------------------------------
Optimized Pentium-IV-based system:
OpenSSL 0.9.7c-p1 30 Sep 2003
built on: Fri Apr 2 08:27:27 IST 2004
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx)
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 1726.48k 3776.53k 5370.03k 6055.36k 6271.25k
mdc2 3273.95k 3776.35k 3931.20k 3968.70k 3991.80k
md4 11715.58k 38997.36k 103675.62k 178255.61k 238275.60k
md5 8934.14k 28045.88k 69794.77k 106627.55k 124504.85k
hmac(md5) 4645.41k 15848.47k 47652.73k 90719.45k 121526.49k
sha1 8252.23k 23922.96k 52550.74k 74687.05k 85330.16k
rmd160 6495.77k 17329.79k 34379.13k 45735.61k 50501.56k
rc4 77374.43k 85999.31k 88387.11k 89063.26k 88959.69k
des cbc 12892.70k 13269.26k 13355.82k 13448.57k 13487.01k
des ede3 4735.53k 4773.85k 4785.02k 4799.68k 4794.03k
idea cbc 13666.19k 14120.88k 14239.32k 14305.44k 14278.27k
rc2 cbc 11083.37k 11796.76k 11756.10k 11896.34k 11602.77k
rc5-32/12 cbc 59411.37k 64298.25k 65320.48k 66806.59k 66736.33k
blowfish cbc 39247.42k 42614.92k 43648.10k 43835.60k 43885.58k
cast cbc 25203.76k 26688.89k 27010.57k 27590.74k 27322.58k
aes-128 cbc 65918.18k 66334.91k 68613.90k 70164.26k 69141.19k
aes-192 cbc 56768.29k 58742.12k 61008.16k 61665.29k 60882.01k
aes-256 cbc 52524.87k 53551.62k 54792.55k 54951.13k 54106.28k
sign verify sign/s verify/s
rsa 512 bits 0.0015s 0.0001s 667.3 7133.5
rsa 1024 bits 0.0084s 0.0004s 119.5 2324.9
rsa 2048 bits 0.0514s 0.0015s 19.5 679.1
rsa 4096 bits 0.3451s 0.0052s 2.9 193.5
sign verify sign/s verify/s
dsa 512 bits 0.0013s 0.0016s 752.3 632.4
dsa 1024 bits 0.0042s 0.0050s 238.6 199.0
dsa 2048 bits 0.0142s 0.0172s 70.2 58.3
--------------------------------------------------------------------------------
In openssl the results are not so bad: some algorithms runs faster and another ones
slower in the optimized system. Take a look at the difference between performance of
Atlon and Pentium on "idea cbc" test. Another example is "aes-256 cbc".
So, as You can see, there is not a simple answer to Your question in the common case.
If You interested in high performance in some specific task - You must test the task
on different hardware platforms with different optimizations (or without ones ;-)
More information about the freebsd-performance
mailing list