cvs commit: src/sys/sparc64/include in_cksum.h
marius at alchemy.franken.de
Sat Jun 28 11:44:22 UTC 2008
On Sat, Jun 28, 2008 at 12:57:28AM +0200, Christoph Mallon wrote:
> Marius Strobl wrote:
> >I wasn't aware that the clobber list allows to explicitly specify
> >the condition codes, thanks for the hint. Though it unfortunately
> >took me longer than two days to verify it's effect on the generated
> >code; sparc64 could still have been one of the archs where "cc" has
> >no effect. Besides I don't think using "__volatile" for this is
> >that wrong, given that the sparc64 code generated by using "cc"
> >and "__volatile" is nearly identical and given that at least i386
> >relies on "__volatile" telling GCC that the inline assembler uses
> >the condition codes since quite some time. So the condition codes
> >are probably part of what GCC treats as "important side-effects".
> If this is true and GCC only handles the eflags on x86 correctly, when
> __volatile is used, but not if "cc" is marked as clobbered, then this is
> clearly a bug.
> >Regarding the MFC, they don't happen automatically and the change
> >was not wrong in general so there was no need to hurry :)
> I still think, using __volatile only works by accident. volatile for an
> assembler block mostly means "this asm statement has an effect, even
> though the register specification looks otherwise, so do not optimise
> this away (i.e. no CSE, do not remove if result is unused etc.).
> On a related note: Is inline assembler really necessary here? For
> example couldn't in_addword() be written as
> static __inline u_short
> in_addword(u_short const sum, u_short const b)
> u_int const t = sum + b;
> return t + (t >> 16);
> } ?
> This should at least produce equally good code and because the compiler
> has more knowledge about it than an assembler block, it potentially
> leads to better code. I have no SPARC compiler at hand, though.
With GCC 4.2.1 at -O2 the code generated for the above C version
takes on more instruction than the inline assembler so if one
wants to go for micro-optimizing one should certainly prefer the
inline assembler version.
> In fact the in/out specification for this asm block looks rather bad:
> "=&r" (__ret), "=&r" (__tmp) : "r" (sum), "r" (b) : "cc");
> The "&"-modifiers (do not use the same registers as for any input
> operand value) force the compiler to use 4 (!) register in total for
> this asm block. It could be done with 2 registers if a proper in/out
> specification was used. At the very least the in/out specification can
> be improved, but I suspect using plain C is the better choice.
The "&"-modifiers are necessary as the inline assembler in
question consumes output operands before all input operands
are consumed. Omitting them caused GCC to generate broken
code in the past.
More information about the cvs-all