Bad performance on alpha? (make buildworld)
Chuck Swiger
cswiger at mac.com
Wed Feb 25 12:36:01 PST 2004
David O'Brien wrote:
> On Wed, Feb 25, 2004 at 12:19:15AM -0500, Chuck Swiger wrote:
>>>Maybe in theory, but not necessarily in practice.
>>
>>It's been a few years since I'd written a compiler, but my viewpoint isn't
>>based entirely on theory.
[ ... ]
>> Your technical description is accurate, but the points you are making here
>> seem to support my argument, rather than contradict what I said. :-)
>
> You're assuming you're writing a compiler targeting _1_ specific
> architecture.
No, sir, I certainly do not make such an assumption.
Most optimization techniques are architecture-independant: liveness analysis,
CSE, dead code elimination, moving invariants out of loops, branch threading,
algorithmic identities and strength-reduction. These optimizations are most
commonly done working with the 3-argument intermediate code that portable
compilers (PCC, GCC) typically utilize before target platform code generation
is actually performed.
There are a few additional optimizations which are architecture specific, such
as instruction scheduling and peephole/template optimizations, but these
optimizations generally make much less difference to performance than the
architecture-independant optimizations mentioned above. Although on some
platforms, they can make enough difference that a second pass at CSE or
instruction rescheduling against the target assembly code can be worth doing.
> It doesn't matter what is possible, what matters is what
> GCC does. Please go analysis GCC and report the deficiencies. I
> personally would love to know what they are, and how to make GCC do
> better on non-x86 platforms.
I agree that what GCC does matters, not theories.
I don't have access to Alpha hardware, which is a barrier although not an
insuperable one. I'd do better considering SPARC or PPC hardware, which I
actually have available to me. Still, I won't use this as an excuse:
A quick look suggests that Alpha code generation is deficient dealing with
unsigned integers because the architecture uses a "sign extended" format to
store and convert 32-bit unsigned ints (aka "long words") into the (64-bit,
aka "quad-word") registers. Dealing with unsigned ints smaller than 32-bits
very probably is also slow because the Alpha requires operand-size
byte-alignment for all memory access.
[ "The Alpha does not directly support byte-level operations such as
transferring single bytes between memory and registers. In principal, we could
use the instructions already presented to realize bytelevel manipulations, but
a large amount of shifting and masking would be required. For example,
consider the C operation *dest = *src, where both dest and src are of type
(char *). This operation must read the single byte pointed to by src and
update the single byte pointed to by dest. Without special byte manipulation
instructions, this simple operation requires 17 Alpha instructions!" ]
Supposedly, the ldq_u and stq_u instructions are the right way to handle
byte-level memory access, and it would be worth looking at how well GCC
utilizes these opcodes dealing with chars and shorts.
Some of these issues cannot be addressed by changes to the compiler: I suspect
that FreeBSD's derivation and focus on the x86 architecture means it uses a
lot of int8 or int16 values which are fast on Intel hardware, whereas using
int32 or int64 representations would actually prove much faster on the Alpha
than using smaller-sized quantities.
--
-Chuck
More information about the freebsd-performance
mailing list