Gcc46 and 128 Bit Floating Point
chat95 at mac.com
Thu Mar 15 11:55:47 UTC 2012
Hi Thomas D. Dean
Why not using double-double approach?
double-double is poorman's quad math.
Using NVIDIA C2050, we can obtain 16GFlops to 26GFlops performance
for matrix-matrix multiplication.
I have been developing a linear algebra library.
From: "Thomas D. Dean" <tomdean at speakeasy.org>
Subject: Re: Gcc46 and 128 Bit Floating Point
Date: Wed, 29 Feb 2012 00:08:07 -0800
> On 02/28/12 22:03, Bruce Evans wrote:
>> But why would you want it? It is essentially unusable on sparc64,
>> since it is several thousand times slower than 80-bit floating point
>> on i386. At equal CPU clock speeds, it is only about 1000 times
>> Most of the factors of 10 are due to fundamental slowness of multi-
>> word artithmetic in software and the soft-float implementations not
>> being very good (I only tested with the old NetBSD/4.4BSD-derived one.
>> This has been replaced by the Hauser one, which has good chances for
>> being worse due to its greater generality and correctness, but the old
>> one has a lot of slop to improve). A modern x86 is much faster than
>> an old sparc64, giving about another factor of 10. 64-bit operations
>> are only about this 10 times slower (or more like 3 times slower at
>> equal CPU clock speeds) on an old sparc64 as on a not-so-modern core2
>> x86. The gnu libraries might be better. So you could hope for only
>> a factor of 100 slowdown on scalar code. But modern x86's can also
>> do vector code, and thus be up to 8 times faster for 32-bit floating
>> point with AVX. Really good multi-word libraries might be able to
>> exploit some vector operations, but I think multi-word operations are
>> too seial in nature to get much parallelism with them.
> I have an application that takes 10 days to run on a 4.16GHz Core-i7
> 3930K. No output until it finishes.
> When I first started looking at this, I naively thought the 80-bit FPU
> floats were scaled to 128-bits. Would be nice...
> The application uses libgmp, but, about 1/2 to 2/3 of the work will
> fit in a 128-bit float.
> I wanted to get 128-bit floating point operations so I could do 2/3
> the work in an FPU. With 80-bits, I can only do 1/3 the work(+-).
> Mostly, this is just "can I do it faster...". Maybe some asm code to
> work the inner loops in FPU registers. At some point, hand off to
> libgmp. I now think the speed improvement would not be worth the
> Tom Dean
> freebsd-amd64 at freebsd.org mailing list
> To unsubscribe, send any mail to
> "freebsd-amd64-unsubscribe at freebsd.org"
More information about the freebsd-amd64