Bad performance on alpha? (make buildworld)

Wed Feb 25 12:36:01 PST 2004

David O'Brien wrote:
> On Wed, Feb 25, 2004 at 12:19:15AM -0500, Chuck Swiger wrote:
>>>Maybe in theory, but not necessarily in practice.
>>
>>It's been a few years since I'd written a compiler, but my viewpoint isn't 
>>based entirely on theory.
[ ... ]
>> Your technical description is accurate, but the points you are making here 
>> seem to support my argument, rather than contradict what I said.  :-)
> 
> You're assuming you're writing a compiler targeting _1_ specific
> architecture.

No, sir, I certainly do not make such an assumption.

Most optimization techniques are architecture-independant: liveness analysis, 
CSE, dead code elimination, moving invariants out of loops, branch threading, 
algorithmic identities and strength-reduction.  These optimizations are most 
commonly done working with the 3-argument intermediate code that portable 
compilers (PCC, GCC) typically utilize before target platform code generation 
is actually performed.

There are a few additional optimizations which are architecture specific, such 
as instruction scheduling and peephole/template optimizations, but these 
optimizations generally make much less difference to performance than the 
architecture-independant optimizations mentioned above.  Although on some 
platforms, they can make enough difference that a second pass at CSE or 
instruction rescheduling against the target assembly code can be worth doing.

> It doesn't matter what is possible, what matters is what
> GCC does.  Please go analysis GCC and report the deficiencies.  I
> personally would love to know what they are, and how to make GCC do
> better on non-x86 platforms.

I agree that what GCC does matters, not theories.

I don't have access to Alpha hardware, which is a barrier although not an 
insuperable one.  I'd do better considering SPARC or PPC hardware, which I 
actually have available to me.  Still, I won't use this as an excuse:

A quick look suggests that Alpha code generation is deficient dealing with 
unsigned integers because the architecture uses a "sign extended" format to 
store and convert 32-bit unsigned ints (aka "long words") into the (64-bit, 
aka "quad-word") registers.  Dealing with unsigned ints smaller than 32-bits 
very probably is also slow because the Alpha requires operand-size 
byte-alignment for all memory access.

[ "The Alpha does not directly support byte-level operations such as 
transferring single bytes between memory and registers. In principal, we could 
use the instructions already presented to realize bytelevel manipulations, but 
a large amount of shifting and masking would be required. For example, 
consider the C operation *dest = *src, where both dest and src are of type 
(char *). This operation must read the single byte pointed to by src and 
update the single byte pointed to by dest. Without special byte manipulation 
instructions, this simple operation requires 17 Alpha instructions!" ]

Supposedly, the ldq_u and stq_u instructions are the right way to handle 
byte-level memory access, and it would be worth looking at how well GCC 
utilizes these opcodes dealing with chars and shorts.

Some of these issues cannot be addressed by changes to the compiler: I suspect 
that FreeBSD's derivation and focus on the x86 architecture means it uses a 
lot of int8 or int16 values which are fast on Intel hardware, whereas using 
int32 or int64 representations would actually prove much faster on the Alpha 
than using smaller-sized quantities.

-- 
-Chuck