Gcc46 and 128 Bit Floating Point

Thu Mar 1 07:49:03 UTC 2012

On Wed, 29 Feb 2012, Peter Wemm wrote:

> On Wed, Feb 29, 2012 at 12:40 AM, Bruce Evans <brde at optusnet.com.au> wrote:
>> On Wed, 29 Feb 2012, Thomas D. Dean wrote:
>>> On 02/28/12 22:03, Bruce Evans wrote:
>>>> But why would you want it? It is essentially unusable on sparc64,
>>>> since it is several thousand times slower than 80-bit floating point
>>>> on i386. At equal CPU clock speeds, it is only about 1000 times slower.
>>>> ...
>>> I have an application that takes 10 days to run on a 4.16GHz Core-i7
>>> 3930K. No output until it finishes.
>>
>> Look elsewhere :-).  1000 times slower than that would be bad :-).
>
> See below:
>
>>> The application uses libgmp, but, about 1/2 to 2/3 of the work will fit in
>>> a 128-bit float.
>
> This is what he's getting at.  If he could get access to 128 bit fp,
> he could move between 1/2 and 2/3 of the work into hardware operations
> and bypass a large chunk of GMP work which would be many many times
> slower than 128 bit hardware FP.

Yes, and 256-bit hardware FP would be even faster.  But neither exists.
There is no magic, and libgmp is likely already faster than any software
FP can be, since it has access to the same wide SSE/AVX registers as 
software FP, and multi-precision integers are a little easier, and is
more developed.  I think software FP would only beat software mp if
the algorithm really wanted FP and this had to be emulated with multi-
precision integers.  I don't know if libgmp already does the latter.
It would be difficult for an application to fake it efficiently, but
an mp library could do much the same as an FP library for it and then
integrate it efficiently with the integers.

> ports gcc with -march/-mtune set correctly and quadmath is his only
> option.  If he's got gcc-4.6 generating code for generic amd64
> instructions it won't use that stuff and will soft-float it.  Those
> switch settings might be the difference between the earlier code
> reports that didn't show use of the instructions vs later ones that
> did.

On sparc64 you can have "hardware" 128-bit FP with nice 64-bit
instructions for it, but gcc intentionally doesn't use this by default,
because apparently there is no sparc64 hardware that implements it in
hardware; thus it has to be emulated in software using essentially
the same code as soft-float, but has extra overheads for trapping
on every 128-bit instruction.  In the FreeBSD implementation, many
of the traps are handled in userland.  So there is first the trap
overhead, then signal handler overhead before getting to soft-float.
Handling traps in userland makes the traps easier to debug, but
gdb support for sparc64 FP is primitive and gdb support for sparc64
FP signal handlers is worse.

> libm and libc can't grow support for __float128 with our existing
> compiler.  We could write some in assembler but that doesn't do
> anything for libc like printf.  he also said "no output for 10 days"
> so I'm guessing printf isn't an issue.
>
> Keeping it out of band with gcc-4.6+ / libquadmath and some impedance
> matching with libgmp is his only practical option.  Later snapshots of
> gcc may even be required if its missing things he needs.

Indeed, the toolchain issues are hard to handle, even if you have gcc-4.6+
set up.

Bruce