Undefined Behavior in lib/msun/src/e_pow.c (was Re: New math library from ARM)

Mon Jan 14 15:05:04 UTC 2019

On 1/14/19 7:52 AM, Bruce Evans wrote:
> On Tue, 1 Jan 2019, Pedro Giffuni wrote:
>
>> On 31/12/2018 23:54, Steve Kargl wrote:
>>> On Mon, Dec 31, 2018 at 10:32:06PM -0500, Pedro Giffuni wrote:
>>>> Hmm ...
>>>>
>>>> Looking at the changes in musl's libc I found this issue which seems
>>>> real (although somewhat theoretical):
>>>>
>>>> https://git.musl-libc.org/cgit/musl/commit/src/math?id=688d3da0f1730daddbc954bbc2d27cc96ceee04c 
>>>>
>>>>
>>>> Is the attached patch acceptable?
>>>>
>>>> Also, their code is bit different here:
>>>>
>>>> https://git.musl-libc.org/cgit/musl/commit/src/math?id=282b1cd26649d69de038111f5876853df6ddc345 
>>>>
>>>>
>>>> but we may also have to check fmaf(-0x1.26524ep-54, -0x1.cb7868p+11,
>>>> 0x1.d10f5ep-29).
>>>>
>>>> Cheers,
>>>>
>>>> Pedro.
>>>>
>>>> Index: lib/msun/src/e_pow.c
>>>> ===================================================================
>>>> --- lib/msun/src/e_pow.c    (revision 342665)
>>>> +++ lib/msun/src/e_pow.c    (working copy)
>>>> @@ -130,6 +130,7 @@
>>>>       if(hx<0) {
>>>>           if(iy>=0x43400000) yisint = 2; /* even integer y */
>>>>           else if(iy>=0x3ff00000) {
>>>> +        uint32_t j;    /* Avoid UB in bit operations below. */
>>>>           k = (iy>>20)-0x3ff;       /* exponent */
>>>>           if(k>20) {
>>>>               j = ly>>(52-k);
>
> I was away.
>
> I see you committed a better version (with a cast).
>
Yes, I did what seemed to be the least invasive change, but then the 
problem with duct tape is that it may work well enough to forget about 
the issue.

I haven't MFC'd it while you guys decided on a better solution.

> The correct fix is more like the changing the file scope i and j from 
> int32_t
> to u[_]int32_t.  j is mostly used as a temporary to hold pure bits.  
> Its top
> bit is usually not set, so plain int32_t works for it.  The only other 
> place
> where its top bit can obviously be set is in an EXTRACT_WORDS(j,i,z).  
> Then
> both j and i can have their top bit set.  EXTRACT_WORDS() into h[xy] and
> l[xy] is also sloppy.  There l[xy] is unsigned, but h[xy] is signed.  The
> top bit in the high word is soon discarded using ix = hx&0x7fffffff, etc.
> Later, the sign is tested using hx<0 and this is the only excuse for hx
> being signed.  fdlibm does this quite often.  It is a manual optimization
> for ~30-40 year old compilers that couldn't turn (hx&8000000) != 0 into a
> sign test iff the sign test is optimal.
>
The duct-tape cast still looks good enough ;).

> Other style bugs in the above:
>
>>> I'll defer to Bruce on this.  My only comments are
>>> 1) declarations belong at the top of the file where all declarations 
>>> occur
>
> 1a) missing blank line after declaration.
>
>> Modern standards let you declare variables within blocks. For style 
>> we do prefer to start the function with them but in this case we can't.
>
> K&R 1978 C allows declaring variables within blocks.
>
> You can't declare j again in function scope since it is already there 
> with
> a different type.
>
>>> 2) j is already declared as int32_t
>> And it has to be a signed integer: we later check for negative values.
>
> I think that is only when j is misused later for EXTRACT_WORDS().
>
> 2a,2b) Redeclaring j with a different type in an inner block is an even
> larger style bug/obfuscation than declaring a new variable or redeclaring
> j with the same type there.  Redeclaring shadows it and this bug is
> detected by Wshadow.  Redeclaring it with a different type (as needed to
> work) is especially obscure shadowing.
>
>>> 3) uint32_t should be written as u_int32_t.
>>
>> No, uint32_t is the standard type, u_int32_t is a BSDism. Also, the 
>> file consistently uses uint32_t.
>
> No, see previous replies.  3) is correct.
>
Yes, I acknowledged this. I was looking at the musl version which 
replaced all the u_int32_t's.
>>> Are you sure that UB occurs?  Or, is this an attempt to
>>> placate a static analysis tool that only see shifting of
>>> a signed type?  Do you need to make a similar change to
>>> e_powf.c?
>>
>> The point of the Undefined Behavior is precisely that it doesn't 
>> occur: we are lucky and the compiler behaves as it always has but the 
>> standard will let a different compiler behave differently and then we 
>> will have surprises.
>
> No, UB does occur, but is harmless.  ly is unsigned and can have its top
> bit set.  j=ly>>(52-k) shifts right a few bits so doesn't overflow on
> assignment to signed j.  But then j<<(52-k) gives UB attempting to 
> recover
> the top bit of ly gives UB in all cases where the top bit is nonzero.
>
> The else clause doesn't have this problem, since it shifts iy instead 
> of ly
> and iy never has its top bit set.
>
> e_powf.c doesn't have this problem, since there is only 1 32-bit word for
> float precision and it is handled like the top word for double precision
> (only its mantissa bits are shifted).
>
> However, it is easier to use unsigned shifts than to see that signed 
> shifts
> have defined behaviour.  musl did well to only detect and/or fix the 1 
> case
> where UB behaviour clearly occurs.
>
Thanks for the feedback!

For now I would prefer we focus more on whatever the ARM lib does better.

Pedro.