[CFR] mge driver / elf reloc

Mon Jul 21 22:18:30 UTC 2014

On Mon, 21 Jul 2014, Poul-Henning Kamp wrote:

> In message <20140722041517.M3229 at besplex.bde.org>, Bruce Evans writes:
>
>> This seems to make most of byteorder(9) a mistake.  Just about everything
>> can be done better using memcpy() and standard functions in byteorder(3),
>
> Covered by Bruce' "just about" is that code analysers like Coverity
> and FlexeLint cannot see through his proposed version of the macros
> on little-endian architectures because of the inline assembler used
> to access the exchange instructions which the C-compiler do not expose.

Not a problem:
- byteorder(9) already uses the bswap macros in many cases (including for
   all conversions to and from big-endian on little-endian hosts)
- we removed the pure exchange instructions from the x86 bswap16() macro
   some time ago
- we still use inline asm and bswap instructions in it for the x86
   bswap32() and bswap64(), but this is just to avoid pessimizing for
   gcc or ifdefs to do it only for gcc.  clang doesn't need any asms.
- maybe code analyzers don't understand the __packed that I used.  That
   would be either a bug in the analysers or just another unportability
   in __packed.

> Analysers have no such problems with the pure-C versions there today.

Further checking on pluto shows that gcc pessimizes all cases in
le32dec() and le32enc().  It forgets the alignment of variables in
the caller when pointers to them are passed as void *, even though
the function are inline.  On little-endian arches like ia64, these
functions do no byte swapping so they are equivalent to memcpy()
(after loading or storing the non-memory parameter to memory)
Here they are using _builtin_memcpy(), with namespace bugs fixed:

static __inline uint32_t
le32dec(const void *_p)
{
 	uint32_t _t;

 	__builtin_memcpy(&_t, _p, sizeof(_t));
 	return (_t);
}

static __inline void
le32enc(void *_p, uint32_t _u)
{

 	__builtin_memcpy(_p, &_u, sizeof(_u));
}

gcc optimizes these as well as possible.  It now sees alignment in the
caller.

The above assumes a little-endian host.  I think adding htole32()'s
makes it work for all hosts.  This involves ifdefs, but the ifdefs
are already there to select htole32().  Hopefully this remains optimal
when htole32() is nontrivial.

It is probably just a missing optimization that old gcc doesn't see the
alignment in the caller for non-builtins too, but the above is also
simpler.  It moves most expressions to the nontrivial htole32(), etc.

For other sizes, s/32/64/, etc.  No need for shifts and masks.  These are
now in htole32(), etc., or implicit in memcpy().

For other endiannesses, s/le/be/.  No need to reverse the expressions.
The reversal is in htobe32(), etc.

Bruce