Representation of 128 bit floating point numbers in FreeBSD amd64 and Clang

Thu Oct 31 22:11:37 UTC 2013

On Thu, Oct 31, 2013 at 5:38 PM, Bruce Evans <brde at optusnet.com.au> wrote:

> On Thu, 31 Oct 2013, Steve Kargl wrote:
>
>  On Thu, Oct 31, 2013 at 09:27:34AM -0400, Mehmet Erol Sanliturk wrote:
>>
>>>
>>> In FreeBSD amd64 and Clang ,
>>> how can I represent 128 bits ( 34 digits ) variables ?
>>>
>>
> With difficulty, since it is not supported.
>
>  Not sure it can be done with clang, but GCC supports
>> a __float128 type.  GCC refers to this as its TCmode.
>> gfortran, the Fortran compiler that supports REAL(16),
>> uses __float128 internally.  I've never directly used
>> __float128, so can't help beyond this.
>>
>> If you need 128-bits in C on ia32 or x86_64 hardware,
>> you should probably look into using mpfr and mpc.
>>
>
> Even gcc-4.2.1 in FreeBSD generates code to use __float128,
> but the support for it isn't compiled into libgcc for some
> reason.
>
> Why would anyone want to use 128-bit FP on x86?  It is emulated
> similarly to on sparc64.  On sparc64, emulated 128-bit FP is about
> 100 times slower than hardware 64-bit FP.  The emulation is not
> very good, but 128-bit FP is part of the ABI on sparc64 so I would
> expect the emulation to give an even larger slowdown factor in
> x86.
>
> With 80-bit FP, you can't quite exactly count the number of atoms in
> the universe, but you can count the world's GNP in cents for a thousand
> years or so.  Extra accuracy can reduce problems from numerica
> instability and rounding bugs, but a slowdown factor of 100 times is
> a large price to pay for that.
>
> Bruce
>

For ill-conditioned problems and especially when the result is NOT known in
advance ,
use of larger number of digits ( 64 bits versus 80 bits versus 128 bits
versus arbitrary precision )
is much more important from time consumed for the computations .

For example , a polynomial ( with degree 12 ) largest root as 63.xxxxxx|7 (
giving nearly zero for the polynomial ) and 63.xxxxxx|9 ( giving 10 ** 25 (
twenty five zeros at the right of 1 without period ) ) .

On such problems , difference of double precision and quadruple precision
is apparent .

Without arbitrary precision arithmetic , it is not possible to solve
problems after a small number of parameters .

As an example , it may be a very useful experience to invert Hilbert matrix

http://en.wikipedia.org/wiki/Hilbert_matrix

 with single , double and quadruple precision arithmetic to see up to what
degree a correct inverse can be obtained .

My decision is to rewrite all of my numerical analysis programs from
scratch by using arbitrary precision arithmetic because current ( double
precision ) computations are physically useless when the answer is not
known in advance such as sum of the squares should be zero , or  a root
should give zero as polynomial value . Even for such cases , to find a
usable results are extremely  difficult because when number of parameters
increases errors are dominating the results .

Some large number of parameter problem examples in numerical analysis books
or papers are very misleading because when a different initial value set is
given , the algorithms are collapsing immediately .

Therefore , number of digits in computations is much more important than
any other factor such as time .

Thank you very much .

Mehmet Erol Sanliturk