amd64 slower than i386 on identical AMD 64 system? / How is hyperthreading handled on amd64?

JoaoBR joao at
Fri Mar 17 12:17:19 UTC 2006

On Thursday 16 March 2006 18:30, Bruce Evans wrote:
> On Thu, 16 Mar 2006, Peter Wemm wrote:
> > There are a number of weaknesses in the amd64 port too.  In particular,
> > the math library does not yet use the generally superior SSE2
> > instructions.  This is a real setback because the ABI uses SSE2
> > floating point parameter passing.  The effect is that some random libm
> > function is given a SSE2 register, which we convert to and x87 fp stack
> > register, do the x87 operation, then convert the x87 stack register
> > back to a SSE2 register then return the SSE2 result.  This is
> > especially unfortunate when the native SSE2 instruction that would
> > operate on the SSE2 registers directly is faster.  But, I don't know
> > SSE2 nor x87 fpu assembler code very well, so I've done "just enough"
> > to get things to work.

do SSE influence "normal" operations as disk-io, memory access and network ?

> My benchmarks in libm indicate that 64-bitness + SSE2 end up being a
> tiny improvment for single precision and a signifcant improvement for
> double and long double precision (even for long double where SSE2
> cannot be used!), but this is only for versions that doesn't use the
> FPU for transcendental functions, and I think it is mainly from foot
> shooting in the 32-bit versions.  The improvment in double precision
> is needed to be competitive with the hardware transcendental functions,
> and the foot shooting is from heavy use of the GET/SET macros -- these
> macros force things to memory and thus tend to cause pipeline stalls.

sorry, would you mind to say what do you mean with "foot shooting" here? 


A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik

More information about the freebsd-amd64 mailing list