FreeBSD 5.2 v/s FreeBSD 4.9 MFLOPS performance (gcc3.3.3 v/s gcc2.9.5)

Wes Peters wes at softweyr.com
Tue Feb 10 11:29:50 PST 2004


On Monday 09 February 2004 13:20, Juan Tumani wrote:
> I have an Intel D845GE m/b w/ a P4 1.7 CPU and I have the box setup
> to dual boot to either 4.9 or 5.2.  Both OS are right off the latest
> posted iso CD image, i.e., no updates, no kernel tweaks, everything
> vanilla right out of the box.  I compiled flops.c on both 4.9 and
> 5.2 and the 5.2 performance is less than half that of 4.9: 760
> MFLOPS on 4.9 v/s 340 MFLOPS on 5.2.
>
> I tried turning off the SMP and other kernel tweaks and no
> improvement in 5.2.  I then downloaded and installed gcc295 on the
> 5.2 machine and that fixed the problem.  So now all I have to do is
> figure out the gcc 3.3.3 switches to make it run like gcc 2.9.5 or
> figure out how to rebuild 5.2 w/ gcc 2.9.5 :-).

I'm not sure that kernel tweaks are going to make much difference on a 
single-threaded floating point benchmark.  Compiler optimizations sure 
do, though.  (Note: I couldn't find version 1.2 of flops.c, so this is
based on version 2.0.)  On a 2.0GHz P4, I see:

wpeters at salty> cc -o flops -O -DUNIX flops.c
flops.c: In function `main':
flops.c:174: warning: return type of `main' is not `int'
wpeters at salty> ./flops 

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0301    465.4460
     2     -1.4166e-13      0.0619    113.0049
     3      4.7184e-14      0.0365    465.3564
     4     -1.2557e-13      0.0327    458.7438
     5     -1.3800e-13      0.0482    601.5539
     6      3.2380e-13      0.0470    617.2479
     7     -8.4583e-11      0.1692     70.9097
     8      3.4867e-13      0.0510    587.8699

   Iterations      =  512000000
   NullTime (usec) =     0.0008
   MFLOPS(1)       =   150.1795
   MFLOPS(2)       =   174.4286
   MFLOPS(3)       =   352.0107
   MFLOPS(4)       =   544.1166

wpeters at salty> cc -o flops3 -O3 -mcpu=pentium4 -msse2 -DUNIX flops.c
flops.c: In function `main':
flops.c:174: warning: return type of `main' is not `int'
wpeters at salty> ./flops3

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0202    692.2121
     2     -1.4166e-13      0.0199    351.9018
     3      4.7184e-14      0.0251    676.9230
     4     -1.2557e-13      0.0235    637.0627
     5     -1.3800e-13      0.0446    650.2407
     6      3.2380e-13      0.0436    665.0579
     7     -8.4583e-11      0.0567    211.8219
     8      3.4867e-13      0.0436    687.5249

   Iterations      =  512000000
   NullTime (usec) =     0.0006
   MFLOPS(1)       =   417.4252
   MFLOPS(2)       =   396.1492
   MFLOPS(3)       =   567.2668
   MFLOPS(4)       =   669.6139

Pretty good increases across the board.  Slightly off-topic, the same 
test on my Athlon XP 2000+ at home yields:

-bash-2.05b$ cc -o flops3 -O3 -mcpu=athlon-xp -msse2 -DUNIX flops.c
flops.c: In function `main':
flops.c:174: warning: return type of `main' is not `int'
-bash-2.05b$ ./flops3 

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
Illegal instruction (core dumped)

Oh, duh, Athlon doesn't have SSE2.  Try again:

-bash-2.05b$ cc -o flops3 -O3 -mcpu=athlon-xp -msse -DUNIX flops.c
flops.c: In function `main':
flops.c:174: warning: return type of `main' is not `int'
-bash-2.05b$ ./flops3 

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0145    965.8007
     2     -1.4166e-13      0.0108    649.9764
     3      4.7184e-14      0.0146   1162.1140
     4     -1.2557e-13      0.0120   1250.0460
     5     -1.3800e-13      0.0259   1118.8725
     6      3.2380e-13      0.0209   1390.5740
     7     -8.4583e-11      0.0310    387.7082
     8      3.4867e-13      0.0277   1082.6515

   Iterations      =  512000000
   NullTime (usec) =     0.0012
   MFLOPS(1)       =   759.3833
   MFLOPS(2)       =   717.9906
   MFLOPS(3)       =   996.1904
   MFLOPS(4)       =  1210.2268

Wowsers.  Looks like if you're doing floating point, at least floating 
point loops that fit in the Athlon cache, you're a lot better off with 
Athlon than P4.

You might want to try -funroll-loops, but that's enough effort for a 
decade-old benchmark.  For me, at least.

-- 
         "Where am I, and what am I doing in this handbasket?"

Wes Peters                                              wes at softweyr.com




More information about the freebsd-hackers mailing list