svn commit: r317809 - head/share/man/man7

Fri May 5 13:39:29 UTC 2017

On Fri, 5 May 2017, Konstantin Belousov wrote:

> On Fri, May 05, 2017 at 07:13:04PM +1000, Bruce Evans wrote:
>> On Thu, 4 May 2017, Konstantin Belousov wrote:
>> uintptr_t and size_t are are not synonyms for unsigned long on all arches.
>> They only have the same respresentation on 32-bit arches.  On 32-bit arches,
>> they are synonyms for unsigned int, and thus have a lower rank than
>> unsigned long.  This mainly causes problems printing them, but might cause
>> sign extension/overblow problems.  For example, (size_t)0 + (long)-1 is
>> unsigned and large positive on 64-bit arches, but signed and small negative
>> on 32-bit arches.
> Ok.
>
> For now, for the commit of articles fixes, I changed this to note that
> uintptr_t has same size as ulong.  I might revisit this part of text after
> the trivial fixes are done.

Yes, it is worth saying that the sizes (and representation) are the same now,
but don't say that they will always be the same.

Longs have the wrong size on all arches except the unsupported i386-with-
64-bit longs.  They should be twice as long as a machine register.

There are related problems with __int128_t existing.  This breaks intptr_t
being able to represent all integers.  Expanding intptr_t to fix this gives
much the same problems as expanding long on 32-bit systems to avoid the long
long abomination, but more.

>>> +.Pp
>>> +In order to maximize compatibility with future pointer integrity mechanisms,
>>
>> "pointer integrity mechanisms" sounds like management/marketingspeak.
>> "integrity" isn't a relevant property of integer types.  "mechanism" might
>> mean the details of the representation (more than the size), but I think
>> you just mean the size.   Most manipulations of pointers as integers
>> assume the same representation.  You stated that the representation is
>> the same [in future] above, and didn't use the usual caveat "on all
>> supported arches".  I don't like this, but lots of code depends on it.
> AFAIU, cheri is somewhat like Intel MPX, but more.  I do not know fine
> details.
>
> For MPX, pointers are no longer plain pointers, there is a data behind
> the raw value, providing e.g. the range of bytes which are valid to
> dereference through the authentic value of the pointer and arithmetic
> manipulations of it.

So the address space isn't exactly flat.  Better de-emphasize that.

I believe ia64 has fat function pointers, and uses handles so that some
things appear to work normally.  But comparison of function pointer
(handle) addresses is almost meaningless.  It isn't clear how gprof
can work.  It would needs something like mapping function addresses
to handled addresses linearly.  Similarly for conversion to uintptr_t.

> So the phrase is correct, and the below reformulation really removes the
> content.

"integrity" isn't correct since it has nothing to do with integers.
"mechanisms" doesn't make much sens either.  Use the standard term
"representation".

>> Translation of the above: "... compatibility with changes in the size of
>> pointers in future implementations".

Change "size" to "representation" if you want to allow for more than the
size changing.

>>> +Compilers define
>>> +.Dv _LP64
>>> +symbol when compiling for an
>>> +.Dv LP64
>>> +ABI.
>>
>> Further minor grammar problems here and elsewhere:
>> - missing "the" before _LP64
>> - "an" is confusing.  First, "a" might be correct depending on how you
>>    pronounce LP64.  I pronounce it as "el ...", so "an" is better than
>>    "a".  But there is only 1 LP64, so "the" is more correct.  "the LP64
>>    ABI" is confusing too.  LP64 isn't an ABI or a collection of ABIs.
>>    The collection is of arches, many using a single LP64 sub-ABI with
>>    variations in other parts of their ABI.
> Of course there are architectures which have more that one LP64 ABI,
> eg. PowerPC ELF v1 and v2.

No, there is only 1 LP64 by definition.  It is not really an ABI, and only
gives the sizes.

> Even i386, de facto, have two incompatible ABIs now: one older SysVR4 ABI
> where stack is 4-bytes aligned, and modern Linux ABI which claims that
> stack must be 16-bytes aligned, as enforced by modern gcc.  The variation
> is not minor, it causes reliable user pain when mixed in.

That is just a bug in Linux.

More differently, at least mips have little-endian and big-endian variants.
The details seem to be undocumented.  __MIPSEB__ gives big endian in
<machine/endian/h>, but the arch names have different spellings with e's.
LP64 only tells you the sizes, so is very far from giving an ABI.

>>> +Integers are represented as two-complement.
>>> +Alignment of integer and pointer types is natural, that is,
>>> +the address of the variable must be congruent to zero modulo type size.
>>
>> Missing "the" after "modulo".
>>
>> Is it natural for arm?  arm has unnatural struct padding, at least at
>> the end of structs.
>
> I am not sure what exactly you mean about unnatural struct padding,
> AFAIR ARM has the same rule of structure having the alignment requirement
> of the most aligned member.

arm unnaturally pads structs to 4-byte boundaries IIRC.  This is useless
except for alignment.  This affects the ABI just as much as alignment,
since it affects the sizes of nested structs and arrays of structs.

> ARMv7 has a requirement of uint64_t having
> 8-bytes alignment (unlike other 32bit platforms).  Might be, indeed, this
> should be more accurate, but in fact ARM is correct for this sentence.

arm has _ALIGNBYTES = (sizeof(int) - 1) for __ARM_ARCH >= 6, else
_ALIGNBYTES = (sizeof(long long) - 1).  This is backwards if ARMv7
really needs 8.  There are bugs in the comment too.

mips has _ALIGNBYTES = 7.  This is spelled without obfuscations or
abominations.  The comment has less detail but no bugs.

x86 has the necessary obfuscations of using __register_t and __uintptr_t
to parametrize the difference between i386 and amd64.  However, the
comment has worse bugs than arm -- its claim that the type is unsigned
int is just wrong on amd64.

_ALIGNBYTES only gives the smallest alignment that works for all types
(not even that for SSE accesses on x86).  It is pessimal for doubles
and long doubles on i386, and not best for int64_t either.  Does the
Linux ABI "fix" this?  The natural ABI is not easy to see.

> Below are the small corrections.
>
> diff --git a/share/man/man7/arch.7 b/share/man/man7/arch.7
> index 2add6ea3d3e..1c056849861 100644
> --- a/share/man/man7/arch.7
> +++ b/share/man/man7/arch.7
> @@ -39,13 +39,13 @@ Differences between CPU architectures and platforms supported by
> If not explicitly mentioned, sizes are in bytes.
> .Pp
> FreeBSD uses flat address space for program execution, where
> -pointers have the same binary representation as
> +pointers have the same representation as
> .Vt unsigned long
> variables, and
> .Vt uintptr_t
> and
> .Vt size_t
> -types are synonyms for
> +types have same size as
> .Vt unsigned long .

Say that they have the same representation.

We also assume/implement that everything is 2's complement with full
range (the minimum negative value -2**-N is not left out), with the
same natural 2's complement representation except for the size, and
that sizes are powers of 2, and that all natural intN_t exist and are
smallest and fast, so that it is just unportable to hard-code uses of
intN_t and never use int_leastN_t or int_fastN_t.

OK except for "integrity mechanisms".  Change this to something
simpler although not quite correct.  If you want to allow for fat
pointers then the earlier parts about flat address spaces would
need adjustments too.

Bruce