catrig[fl].c and inexact

Bruce Evans brde at optusnet.com.au
Sat May 13 16:19:34 UTC 2017


On Sat, 13 May 2017, Dimitry Andric wrote:

> On 13 May 2017, at 08:08, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:
>>
>> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
>>> On Fri, 12 May 2017, Steve Kargl wrote:
> ...
>>> required for the standard magic.  I planned to fix all this magic using
>>> macros like raise_inexact().
>>
>> If you plan to fix the magic with raise_inexact, then please
>> test with a suite of compilers.  AFAICT, clang is optimizing
>> out the code.  I haven't written a testcase to demonstrate this
>> as I have other irons in the fire.
>
> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4,
> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0
> and 5.0.0.  All versions of gcc produced something similar to the
> following for i386:

Yes, all compilers I tried (only gcc-3.3.3, gcc-4.2.1 and clang-3.9.0)
generate the intended code, but clang-3.9.0 also generates a -Wunused
warning about the variable that it has just used to generated the intended
code!

> # /usr/src/lib/msun/src/catrig.c:318:   raise_inexact();
>        flds    tiny    # tiny
>        fadds   .LC2    #
>        fstps   120(%esp)       # junk

I don't know how to ask for the best code, which is more like

 	flds	tiny
 	fadds	one
 	ffree	%st(0)		# or fstp %st(0) -- MD optimization

but the best code runs insignificantly faster in practice.

> and for amd64:
> [...]
> .L34:
> .LBB33:
> # /usr/src/lib/msun/src/catrig.c:318:   raise_inexact();
>        movss   tiny(%rip), %xmm0       # tiny, tiny.0_28
>        addss   .LC13(%rip), %xmm0      #, _29
>        movss   %xmm0, 188(%rsp)        # _29, junk

Discarding the result is easier for amd64 (just omit the store).  The
volatile hack forces the store.

> E.g., these all look good, at least with regards to not optimizing out
> the desired addition.
>
> The only compiler I could find that does optimize everything away (at
> least in the simplified test case), is the Intel compiler:
>
> https://godbolt.org/g/g1UT2m

Urk.

Bruce


More information about the freebsd-numerics mailing list