cvs commit: src/include ar.h

Tue Nov 14 11:27:58 UTC 2006

On Tue, 14 Nov 2006, Joseph Koshy wrote:

> bde> Non-broken code knows that byte-aligned data needs to be copied
> bde> into structs using memcpy().
>
> Do keep in mind that this is `struct ar_hdr' we are talking about, not
> something else.
>
> Having to use memcpy() to copy from a collection of ASCII strings (in file)
> to a collection of ASCII strings (in struct ar_hdr) to cope with alignment
> issues is odd.

I almost didn't reply because this seemed to make the change have no effect
except to break ar.h for unsupported compilers.

It turned into an argument about __packed in general and struct ip in
particular.

> To be truly portable, code needs to memcpy() in each ASCII string member
> of `struct ar_hdr'  separately since we cannot make assumptions about
> the file and memory layout being identical.

More than that :-).  The code needs to do something magic to fill the holes,
it any.  Kernel code is supposed to prezero using memcpy or
malloc(..., M_ZERO) to avoid copying out insecurities in the holes.

> Needless to say no code in our tree does this today.
>
> So we need to look at the original intent of why a `struct ar_hdr' was
> declared as a collection of fixed size ASCII strings.   My guess is that it
> was written that way to be able to directly describe its file layout: ASCII
> for portability and fixed size char[] arrays to avoid issues with structure
> padding.   Declaring it as __packed informs today's compilers of this
> intent. It also brings down the alignment requirements for `struct ar_hdr'
> on the ARM to match that of other platforms as a side-effect.

The alignment requirement is actually the crtical one.  This should be
explicit (if the packing requirement is).

> bde> I doubt that it is needed for more than breaking the warning.  You
> bde> would have to be unlucky or shooting your feet for
> bde> "(struct ar *)&byte_array[offset]" to give a pointer that is actually
> bde> misaligned.  Large arrays of chars are normally aligned to the word
> bde> size even if they are on the stack, and the archive header is at offset
> bde> 0 in files.
>
> In an ar(1) archive, there is a `struct ar_hdr' before each archive member.
> The only 'alignment' expected for this structure is that of falling on
> even offsets in the file.  This is only enforced programmatically though.

Oops, I grepped for ARMAGIC instead of ARFMAGIC, so I didn't find any file
headers.

Further testing showed:

(1) When __packed breaks the alignment as in struct ip, the compiler
doesn't report lost of alignment on taking addresses, but on an ia64
machine in the FreeBSD cluster, the broken alignment doesn't cause
problems and in fact is faster:

 	struct foo {
 		char	c;
 		int	i;
 	} __packed;

 	struct foo foo;

 	int
 	access_int(int *ip)
 	{
 		return (*ip);
 	}

 	int
 	main(void)
 	{
 	#ifdef BROKEN
 		return (access_int(&foo.i));
 	#else
 		return (foo.i);
 	#endif
 	}

In the !BROKEN case, gcc on ia64 generates large code to load foo.i a byte
at a time.  In the BROKEN case the code just passes a misaligned pointer
to access_int().  This should be an error, since &foo.i has the alignment
of a char and the cast that changes its alignment to that of an int is
only implicit, but I can't find even a warning flag for this.  However,
on the FreeBSD cluster machine, the misaligned access doesn't trap, and
doing the access in a loop shows that the misaligned access is much faster
than the correct code that does the access a byte at a time, much the
same as on an i386.

(2) gcc still has alignment bugs even on i386's.  Even i386's things
should be aligned for efficiency, and the compiler shouldn't generate
any misaligned access for portable code, since if it does then it is
generating inefficient code which breaks detection of user misalignment
via the i486 alignment check feature.  I tried using this feature when
i486's first came out, but it didn't work because gcc generates misaligned
code for struct copies:

 	struct {
 		uint8_t	x;
 		struct {
 			uint8_t	y[15];
 		} ssc;
 		uint32_t z;
 	} x, y;

 	int
 	main(void)
 	{
 		x.ssc = y.ssc;
 		return (0);
 	}

Here the uint32_t and no __packed forces the struct to be 4-byte aligned,
so ssc is known to be perfectly misaligned to an address equal to 1 mod 4.
The struct copy of ssc is done inline and should consist of copying the
first byte, then a 2-byte word, then 3 4-byte words.  Instead, it is done
by copying 3 perfectly misaligned 4-byte words, then 1 misaligned 2-byte
word, then the last byte.  Thus the struct copy is inefficient and always
causes alignment check traps if the alignment check flag is set.

The code generated by gcc for this hasn't changed much since i486's first
came out.  There is one helpful change: gcc now calls memcpy() if the
struct size is larger than about 64, but when i486's came out it generated
the misaligned inline copy for much larger structs (IIRC, for any size).

gcc on i386's also generates inline copies 4 bytes at a time starting
at the first byte, for structs that might only be 1-byte aligned:

 	struct {
 		struct {
 			uint8_t	y[60];
 		} nnssc;
 	} x, y;

Now the alignment cannot be known for sure, but 4-byte alignment can
usually be arranged at little cost for non-small structs, so without
alignment checking the best copying method for medium-sized structs
is to assume that they are 4-byte aligned and use 4-byte accesses.
With alignment checking, this is only best if there is a trap handler
to fix up the alignement and the alignment traps rarely occur,

Bruce