svn commit: r300965 - head/lib/libc/stdlib

Tue May 31 03:42:33 UTC 2016

On Mon, 30 May 2016, Andrey Chernov wrote:

> On 30.05.2016 6:09, Bruce Evans wrote:
>> ...  The correct fix is s/u_long/uint_fast32_t
>> in most places and s/u_long/uint_least32_t/ in some places and then
>> fix any missing "&"'s.  The "fast" and "least" types always exist,
>> unlike the fixed-width types, and using them asks for time/space
>> efficiency instead of emulated fixed-width.
>> ...

[That was the correct fix for longs long ago, not your change here.]

>>>> ==============================================================================
>>>>
>>>> --- head/lib/libc/stdlib/random.c       Sun May 29 16:32:56
>>>> 2016        (r300964)
>>>> +++ head/lib/libc/stdlib/random.c       Sun May 29 16:39:28
>>>> 2016        (r300965)
>>>> @@ -430,7 +430,7 @@ random(void)
>>>>                  */
>>>>                 f = fptr; r = rptr;
>>>>                 *f += *r;
>>>> -               i = (*f >> 1) & 0x7fffffff;     /* chucking least
>>>> random bit */
>>>> +               i = *f >> 1;    /* chucking least random bit */
>>
>> This gives an "&" to restore in the version with correct substitutions.
>>
>> It also breaks the indentation.  (This file mostly indents comments to the
>> right of code to column 40, but column 48 was used here and now column 32
>> is used.)
>>
>>>>                 if (++f >= end_ptr) {
>>>>                         f = state;
>>>>                         ++r;
>
> I don't introduce uint32_t and int32_t here and don't have a slightest
> idea of which types will be better to change them. F.e. *f += *r;
> suppose unsigned 32bit overflow which don't naturally happens for large
> types. Assigning uint32_t to some large type then clip it to smaller
> after calculation - all of that can produce more code than save for
> calculation itself.

Er, I already said which types are better -- [u]int_fast32_t here.

For *f += *r, it is then quite possible that clipping doesn't occur.
The calculations should be done as much as possible in the natural
register width and clipped only once at the end if possible.  Here
I think the addition gives only 1 extra bit and the right shift in the
next bit immediately removes 1 bit and that is all the calculation does
so it is not possible to combine masking steps.

I have considerable experience using wide registers optimally in i386
(i387) FP code in libm.  Without SSE, FP calculations can only be done
in the i387.  Clipping the extra precision after every step was only
about 5 times slower on old CPUs with 0 or 1 pipelines, but it is
serveral times slower than that with more pipelines.  C has poor bindings
related to this.  It requires clipping after every cast and assignment.
This is too slow, so gcc and clang don't do it.  To get code that is both
cast and correct, it is best to use float_t and double_t a lot, so that
almost all calculations are done in the wide registers.  This corresponds
to using int_fastN_t instead of intN_t, int or long.  Clipping steps are
still unfortunately necessary to match APIs and ABIs, and very rarely to
discard extra bits because they are really not wanted.

With SSE, clipping after every step is only 2-3 times slower, but it is
not necessary to widen for any step.  However, not widening gives less
accuracy in most cases.

Bruce