Bad performance on alpha? (make buildworld)
Peter Jeremy
peter.jeremy at alcatel.com.au
Tue Feb 24 19:00:02 PST 2004
On 2004-Feb-24 20:17:07 -0500, Charles Swiger <cswiger at mac.com> wrote:
>On Feb 24, 2004, at 3:26 PM, Nikos Ntarmos wrote:
>>IIRC the 600MHz EV56's performance wrt integer operations (such as
>>compiling) is somewhere in the vicinity of a 400MHz P-II, so the
>>difference you see in turn-around times when buildworld'ing isn't
>>quite that big. If the operations were identical, you should see
>>better times when building on the alpha. However, also take into
>>account that compiling (and optimizing) for a RISC CPU, apart from
>>generating larger binaries, is AFAIK supposedly more difficult than
>>compiling (and optimizing) for a CISC CPU.
>
>I'm afraid you've got this backwards. :-)
Maybe in theory, but not necessarily in practice.
>The primary attributes of RISC architectures, namely lots of registers,
>a relatively simple but orthagonal instruction set, and a relatively
>fast clock rate / CPI ~= 1.0 / a short pipeline make it far easier for
>the compiler to generate and optimize code.
Alpha pipelines are only short in a relative sense - the EV5 pipeline
is 7 (integer) or 9 (FP) stages and I suspect the EV56 pipeline is the
same. In theory, it is 4-way superscalar but the different execution
units aren't equivalent and the compiler has to understand which
instructions will be allocated to which execution units in order to
minimise stalls.
>CISC architectures make the compilers job much harder because they tend
>to require lots of register spills, they tend to have very long
>pipelines which involve hazards and require a lot of instruction
>reordering to avoid stalling the pipeline to often. The amount of CPU
>clocks it takes per instruction (CPI) often varies on CISC as is
>generally much larger than ~1.0, and sometimes varies from CPU model to
>CPU model making it far more difficult to determine the "fastest"
>instruction sequence.
Recent iA32 implementations (basically anything more recent than a
PII) are RISC cores which directly execute a subset of the iA32
instruction set with the remainder handled by microcode. You get
quite respectable results by treating it as a load/store RISC
architecture and relying on the L1 cache to handle the register spills
in a timely fashion. The pipelines and super-scalar execution
abilities are all handled in hardware. Register scoreboarding allows
the implementation to have more physical registers than the programmer
view supports - allowing multiple instructions to simultaneously see
different values in the same visible register.
The compiler has to expend a lot of effort on instruction scheduling
to get decent performance out of a typical RISC architecture. Much of
this is automatically handled by the hardware on an iA32 and you can
get equivalent results with a much simpler compiler.
Peter
More information about the freebsd-performance
mailing list