long double broken on i386?

Tue Oct 2 18:13:44 PDT 2007

On Tue, 2 Oct 2007, Steve Kargl wrote:

> On Tue, Oct 02, 2007 at 01:23:18PM -0400, David Schultz wrote:
>> ...
>> Anyway, my point is that if you have something that works
>> reasonably well and doesn't have egregious errors, my suggestion
>> is to just commit it and not kill yourself over whether the
>> argument reduction is correct in the last ulp.

> There are a few problems: 1) I don't have commit access.  2) I
> need to do style(9) clean-up pass.  3) I've only tested these
> on i386 and amd64, and I know these fail for sparc64. 4) Most
> importantly, I don't know how the project wants to handle the
> 53 vs 64 bit default fpu setting on i386.

(3) is most important.  The C versions will never even be used for
i386 or amd64, since these arches have sin and cos in hardware and (I
believe) it is impossible to beat the hardware on these arches.  The
asm versions have always been trivial to implement (change 2 bytes in
s_sin.S (compare e_sqrt.S with e_sqrtf.S; the long double changes go
the other way).  amd64 actually uses the C versions for double precision
becase the C versions are competitive with the hardware for speed and
beat it for accuracy on large args and on small args aear a multiple
of pi/2.

(4): These functions are useless unless the default is changed.  Users
have always been able to get equivalent _in_accuracy on i386 by just
using the double precision versions.  Hacks like #define sinl sin
almost work for just getting things to compile.  In fact, on i386, due
to the bugfeatures outlined in my previous mail, users can even get
full long double accuracy using hacks like that, even if the rounding
precision is 53 bits:

%%%
#include <math.h>
long double x:
main()
{
 	x = (long double)sin(1);
}
%%%

On i386, this stores the full long double result for sinl(1) in x, due
to the bugfeatures -- sin(1) returns extra precision in a register, and
gcc doesn't understand the extra precision so it just stores all bits
of the register in x.  The behaviour would be similar if sin(1) were used
in an expression.

On amd64, the ABI prevents this from working (doubles must be returned
in a 64-bit XMM register).

> PS: There is also the minor possibility of giving bde either
> a heartache or death via laughter when he finally sees what I
> have done. :)

I would be unhappy with any versions that aren't either:
- as identical as possible in algorithm and style to the double precision
   versions, in the same way that the float versions are as identical as
   possible, or
- much better in algorithm and style and efficiency and accuracy than the
   double precision versions.  I don't want to maintain different versions.

   I tried to turn your original efforts for logl() into the latter for
   logf(), log() and logl(), but so far they are only a little better.
   In particular they are not good enough to commit due to point (3)
   -- the long double version is only implemented for machines with
   80-bit long doubles, and all versions only beat the hardware for
   efficiency on some machines (amd64, but not i386 with P[2-4]).  I
   think the long double version beats the hardware for accuracy but
   haven't finished testing this.  The hardware is much easier to beat
   for efficiency for the log family than for the sin family.  The
   hardware is much easier to beat for accuracy for logl() than for
   anything else.  The hardware is accurate enough for logf() and logl(),
   so is unbeatable for accuracy of these functions unless a slow version
   is used to get more accuracy.

Bruce