Undefined Behavior in lib/msun/src/e_pow.c (was Re: New math library from ARM)

Mon Jan 14 12:52:30 UTC 2019

On Tue, 1 Jan 2019, Pedro Giffuni wrote:

> On 31/12/2018 23:54, Steve Kargl wrote:
>> On Mon, Dec 31, 2018 at 10:32:06PM -0500, Pedro Giffuni wrote:
>>> Hmm ...
>>> 
>>> Looking at the changes in musl's libc I found this issue which seems
>>> real (although somewhat theoretical):
>>> 
>>> https://git.musl-libc.org/cgit/musl/commit/src/math?id=688d3da0f1730daddbc954bbc2d27cc96ceee04c
>>> 
>>> Is the attached patch acceptable?
>>> 
>>> Also, their code is bit different here:
>>> 
>>> https://git.musl-libc.org/cgit/musl/commit/src/math?id=282b1cd26649d69de038111f5876853df6ddc345
>>> 
>>> but we may also have to check fmaf(-0x1.26524ep-54, -0x1.cb7868p+11,
>>> 0x1.d10f5ep-29).
>>> 
>>> Cheers,
>>> 
>>> Pedro.
>>> 
>>> Index: lib/msun/src/e_pow.c
>>> ===================================================================
>>> --- lib/msun/src/e_pow.c	(revision 342665)
>>> +++ lib/msun/src/e_pow.c	(working copy)
>>> @@ -130,6 +130,7 @@
>>>   	if(hx<0) {
>>>   	    if(iy>=0x43400000) yisint = 2; /* even integer y */
>>>   	    else if(iy>=0x3ff00000) {
>>> +		uint32_t j;	/* Avoid UB in bit operations below. */
>>>   		k = (iy>>20)-0x3ff;	   /* exponent */
>>>   		if(k>20) {
>>>   		    j = ly>>(52-k);

I was away.

I see you committed a better version (with a cast).

The correct fix is more like the changing the file scope i and j from int32_t
to u[_]int32_t.  j is mostly used as a temporary to hold pure bits.  Its top
bit is usually not set, so plain int32_t works for it.  The only other place
where its top bit can obviously be set is in an EXTRACT_WORDS(j,i,z).  Then
both j and i can have their top bit set.  EXTRACT_WORDS() into h[xy] and
l[xy] is also sloppy.  There l[xy] is unsigned, but h[xy] is signed.  The
top bit in the high word is soon discarded using ix = hx&0x7fffffff, etc.
Later, the sign is tested using hx<0 and this is the only excuse for hx
being signed.  fdlibm does this quite often.  It is a manual optimization
for ~30-40 year old compilers that couldn't turn (hx&8000000) != 0 into a
sign test iff the sign test is optimal.

Other style bugs in the above:

>> I'll defer to Bruce on this.  My only comments are
>> 1) declarations belong at the top of the file where all declarations occur

1a) missing blank line after declaration.

> Modern standards let you declare variables within blocks. For style we do 
> prefer to start the function with them but in this case we can't.

K&R 1978 C allows declaring variables within blocks.

You can't declare j again in function scope since it is already there with
a different type.

>> 2) j is already declared as int32_t
> And it has to be a signed integer: we later check for negative values.

I think that is only when j is misused later for EXTRACT_WORDS().

2a,2b) Redeclaring j with a different type in an inner block is an even
larger style bug/obfuscation than declaring a new variable or redeclaring
j with the same type there.  Redeclaring shadows it and this bug is
detected by Wshadow.  Redeclaring it with a different type (as needed to
work) is especially obscure shadowing.

>> 3) uint32_t should be written as u_int32_t.
>
> No, uint32_t is the standard type, u_int32_t is a BSDism. Also, the file 
> consistently uses uint32_t.

No, see previous replies.  3) is correct.

>> Are you sure that UB occurs?  Or, is this an attempt to
>> placate a static analysis tool that only see shifting of
>> a signed type?  Do you need to make a similar change to
>> e_powf.c?
>
> The point of the Undefined Behavior is precisely that it doesn't occur: we 
> are lucky and the compiler behaves as it always has but the standard will let 
> a different compiler behave differently and then we will have surprises.

No, UB does occur, but is harmless.  ly is unsigned and can have its top
bit set.  j=ly>>(52-k) shifts right a few bits so doesn't overflow on
assignment to signed j.  But then j<<(52-k) gives UB attempting to recover
the top bit of ly gives UB in all cases where the top bit is nonzero.

The else clause doesn't have this problem, since it shifts iy instead of ly
and iy never has its top bit set.

e_powf.c doesn't have this problem, since there is only 1 32-bit word for
float precision and it is handled like the top word for double precision
(only its mantissa bits are shifted).

However, it is easier to use unsigned shifts than to see that signed shifts
have defined behaviour.  musl did well to only detect and/or fix the 1 case
where UB behaviour clearly occurs.

Bruce