svn commit: r300965 - head/lib/libc/stdlib

Tue May 31 07:20:42 UTC 2016

On 31.05.2016 8:53, Bruce Evans wrote:
> On Tue, 31 May 2016, Andrey Chernov wrote:
> 
>> On 31.05.2016 6:42, Bruce Evans wrote:
>>>
>>> Er, I already said which types are better -- [u]int_fast32_t here.
>>
>> [u]int_fast32_t have _at_least_ 32 bits. int32_t in the initial PRNG can
>> be changed since does not overflow and involve several calculations, but
>> uint_fast32_t is needed just for two operations:
> 
> I think you mean a native uint32_t is needed for 2 operations.
> 
>> *f += *r;
>> i = (*f >> 1) & 0x7fffffff;
> 
> This takes 2 operations (add and shift) with native uint32_t.  It takes 4
> logical operations (maybe more physically, or less after optimization)
> with emulated uint32_t (add, mask to 32 bits (maybe move to another
> register to do this), shift, mask to 32 bits).  When you write the final
> mask explicitly, it is to 31 bits and optimizing this away is especially
> easy in both cases.
> 
>> We need to assign values from uint32_t to uint_fast32_t (since array
>> size can't be changed),
> 
> FP code using double_t is similar: data in tables should normally be
> in doubles since double_t might be too much larger; data in function
> parameters is almost always in doubles since APIs are deficient and
> don't even support double_t as an arg; then it is best to assign to
> a double_t variable since if you just use the double then expressions
> using it will promote it to double_t but it is too easy to lose this
> expansion too early.  It takes extra variables and a little more code
> for the assignments, but the extra variables are optimized away in
> cases where there is no expansion.
> 
>> do this single operation fast and store them
>> back into array of uint32_t. I doubt that much gain can comes from it
>> and even pessimization in some cases. Better let compiler do its job
>> here.
> 
> It's never a pessimization if the compiler does its job.
> 
> It is good to practice this on a simple 2-step operation.  Think of a
> multi-step operation where each step requires clipping to 32 bits.
> Using uint32_t for the calculation is just a concise way of writing
> "& 0xffffffff" after every step (even ones that don't need it).  It
> is difficult and sometimes impossible for the compiler to optimize
> away these masks across a large number of steps.  Sometimes this is
> easy for the programmer.

The biggest problem so far is that fast types for [u]int32_t are exact
_the_same_ as not fast for i386 and amd64, see /usr/include/x86/_types.h
Without any gain on major platforms I don't think this change is needed.