amd64 slower than i386 on identical AMD 64 system? / How is hyperthreading handled on amd64?

Peter Wemm peter at
Thu Mar 16 17:17:32 UTC 2006

On Thursday 16 March 2006 02:46 am, JoaoBR wrote:
> On Wednesday 15 March 2006 18:56, Peter Wemm wrote:
> > I tend to agree with this.  ubench is not a useful benchmark for
> > comparing 32 bit vs 64 bit systems.
> >
> > However, what might be interesting is to compile a 32 bit binary
> > (and statically link it) on the i386 system, and compare the
> > runtime on the 64 bit kernel, using the same identical binary. 
> > That way you are measuring the same math operations on both
> > platforms.  Comparing 64 bit operations vs 32 bit operations is
> > apples vs oranges.
> >
> > Of course, it may still be slower, but at least the results would
> > be more meaningful.  Don't assume the OS is slower because the
> > compiler makes the application do twice the work.
> good point
> what do you think of unixbench since it does some real-life tasks?

In general, I don't like synthetic benchmarks at all.  What we do at 
work is put them under real workloads alongside a comparison system, 
and measure idle cpu trends over a day or so.  A comparison where one 
machine has a 30% idle cpu and the other has a 40% idle cpu under the 
same *real* workload tells us the most.

Unfortunately, we have some folks here that like to push the machines to 
the wall.  The problem is that FreeBSD 5 and later tend to not "hit the 
wall gracefully" and the results of those are more often a test of how 
badly the kernel suffers from lock contention than how it runs under 
real load.  Still, the max workload numbers are useful because it tells 
you what the worst case is.

BTW: don't compare 'make buildworld' of i386 vs amd64, because amd64 not 
only builds things differently, but builds all the libraries twice.  
amd64 has 5 stages, i386 has 4.  Even a 'make TARGET_ARCH=i386' isn't 
entirely a fair comparison because one has to build a 64 bit host 
compiler in one stage, the other has to build a 32 bit host compiler. 
gcc even turns off some optimizations when operating as a cross 
compiler.  An actual 32 bit buildworld in a 32 bit chroot on both 
machines is a fair comparison of buildworld times from an OS 
perspective because they are building exactly the same thing.  But that 
doesn't make it meaningful if you're interested in 'buildworld' times 
as a FreeBSD developer who does a buildworld umpteen times per day as 
part of compile testing.

Anyway, one has to keep in mind whether a given test is of the operating 
system port, or the cpu architecture, or application performance.  
ubench in particular is stronly affected by 32 vs 64 bit because it 
generates a very different workload for itself depending on the size of 
the machine.

There are a number of weaknesses in the amd64 port too.  In particular, 
the math library does not yet use the generally superior SSE2 
instructions.  This is a real setback because the ABI uses SSE2 
floating point parameter passing.  The effect is that some random libm 
function is given a SSE2 register, which we convert to and x87 fp stack 
register, do the x87 operation, then convert the x87 stack register 
back to a SSE2 register then return the SSE2 result.  This is 
especially unfortunate when the native SSE2 instruction that would 
operate on the SSE2 registers directly is faster.  But, I don't know 
SSE2 nor x87 fpu assembler code very well, so I've done "just enough" 
to get things to work.

It is worth reiterating that I do NOT expect the amd64 port to be better 
than i386 across the board.  Nor even in most tests.  But the 
difference should be minimal, except in some specific cases where the 
64 bit nature really helps.  eg: if you want to mmap a 3GB file.  You 
can't do that on an i386 kernel machine.  I think of the advantages of 
using the amd64 port in terms of functionality rather than performance.  

You definately have to consider functionality if you want a desktop 
though.  flash plugins for browsers are right out, for example, unless 
you use the linux browser builds.  Most of the time though, no flash is 
usually good because you get less annoying ads. :-)

Peter Wemm - peter at; peter at; peter at
"All of this is for nothing if we don't go to the stars" - JMS/B5

More information about the freebsd-amd64 mailing list