Complex arg-trig functions

Sat Aug 4 20:46:00 UTC 2012

On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote:

> On 08/04/2012 03:49 AM, Bruce Evans wrote:
>> On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote:
>
>> I made good progress with clog[fl]().  The accuracy problems near 0
>> are long solved and proved to be solved.
>
> I would be interested to see how you solved this.  Would you be willing to 
> post your code somewhere?

I just subtracted 1 in the right place:

         x*x - 1 + y*y              needs doubled precision (and |x| >= |y|)
                                    (also need x*x >= 0.5)
         (x*x - 0.5) + (y*y - 0.5)  adjustment for above when x*x and y*y
                                    are between 0.25 and 0.5
         x*x + y*y - 1              needs tripled precision (and |x| != 1)
         log(x*x + y*y)             log() would subtract 1 at the same point
                                    as the previous expression, so would need
                                    a tripled-precision log.

To see that doubled precision is enough:
- no problem if x*x + y*y is far from 1, so assume it is close to 1
- doubled precision makes x*x and y*y exact
     (It must be full doubled precision.  I first thought that there was a
     problem splitting types with odd MANT_DIG.  E.g., MANT_DIG = 53
     for doubles.  This must be split as 26 high bits and 27 low bits,
     but then squaring the 27 low bits in double precision at first
     seems to need 54 bits, and we only have 53.  This problem is avoided
     by rounding the high term to nearest.  Then the low term only needs
     26 bits.)
- sort so that |x| >= |y|, and take absolute values so that I don't need
   to write '|*|' again
- if x*x >= 0.5, then x*x - 1 is exact in doubled precision (since x*x
   must be <= 2 by the assumption that x*x + y*y is near 1.
- if x*x < 0.5, then it must be >= 0.25, and y*y must be in the same range,
   with both slightly less than 0.5, to get x*x + y*y.  So x*x - 0.5 and
   y*y - 0.5 are exact in doubled precision.
- we have created 2 exact terms that add up to x*x + y*y - 1 in infinite
   precision.  In doubled precision, the sum is usually not exact.  But
   in the problem case where there is a large cancelation, the sum is
   exact; other cases are easier:
   - when the exponents of the 2 terms differ by more than 1, there is no
     cancelation and we get a doubled precision result with all bits
     correct (the last one correctly rounded).  This is more than enough.
   - when the exponents of the 2 terms differ by at most 1, first line up
     the bits by shifting as necessary.  Clearly at most 2 more bits are
     required to hold an exact result.  But cancelation of just 2 leading
     bits opens space to hold the exact result in doubled precision.
     Otherwise, we are in the previous case, with 1-2 bits less than
     doubled precision.  This is still more than enough.
   So we have x*x + y*y - 1 in doubled precision except possibly for
   errors other than the rounding error in the lowest 2 bits.

To see that the difference is not very small:
- if x > 1, then x >= 1 + DBL_EPSILON, and x*x is already not very close
   to 1
- if x < sqrt(2)/2, similarly (I didn't check this properly)
- if sqrt(2)/2 < x < 1, write x = 1 - u where u is positive and not large.
   Then x*x = 1 - 2*u + u*u.  Any y such that y*y when added to this gets
   close to 1 must be near sqrt(2*u).  Since u can't be very small, y
   can't be very small.  The worst cast is when u = XXX_EPSILON/2.  In
   double precision, u = 2**-53.  Then 2*u = 2**-52 and y ~= 2**-26.  When
   we line up bits to add the doubled precision terms the don't extend
   very far to the right.  They fit in less than 3 53-bit bit strings, with
   the leading bit representing 1 and the rightmost bit representing
   2**-158.
     (x*x and -1 fit in 2 53-bit bit-strings.  Since y ~= 2**-26, y*y's
     exponent is 52 or 53 less than 1's exponent.  Shifting to line up
     the bits requires just 1 more 53-bit bit string to the right of
     the other 2.)
   If everything canceled except for the rightmost bit, then the value
   would be 2**-158.  This is the smallest possible value, since the
   above worst case has u*u's bits as far as possible to the right
   (after shifting), and the final result gets its lowest bits from u*u.
   The smallest value observed in double precision was 2**-156.  This
   happens when u = 2**-52.  (Due to numerical accidents, when u = 2**-53,
   y*y can't kill enough bits of -2*u + u*u to get close.)

Limiting the closeness is important for showing that underflow can't
happen either in the calculations or on conversion of the doubled
precision result, but underflow tends to be avoided naturally.  It
is important that y*y doesn't underflow (including when y is split
into hi+lo, lo*lo must not underflow).  So tiny y must not be handled
by this case.  The "lining up bits" argument shows that tiny y can't
get close to 1.

> For the inverse trig functions, my accuracy is about 4 ulps, and I am not 
> looking to improve it at all.  In particular, the algorithm for casin(h) and 
> cacos(h) is so complicated that I don't see how this can be improved without 
> a lot of work.
>
> However if you solve this problem for clog, I can see that it is very likely 
> that your solution will also work for catan(h).  If you get this problem 
> licked, I will be interested to see how you do it.

I seem to have finally got it below 2.0 (mostly below 1.5) for clog().
clog() is simpler, but it was still hard to find all the exceptional
behaviours, so that the worse cases could even be noticed.

>> Then there is the problem of x and y tiny or denormal but with their
>> ration not tiny enough to ignore one of them.  1/z is then large or
>> infinite, but it it should rarely be infinite, and underflow shouldn't
>> occur for just x*x+y*y.  This is easily handled as in hypot() (not sure
>> about the method here) and now clog() by scaling:
>> 
>> %     /* Avoid underflow. */
>> %     if (ax < 0x1p-511)
>> %         return (cpack(
>> %             log(hypot(x * 0x1p511, y * 0x1p511)) - 511 * ln2_lo -
>> %             511 * ln2_hi,
>> %             atan2(y, x)));
>
> Wow - this is really getting down and dirty!  Why do you first subtract 511 * 
> ln_2 lo and then 511 * ln2_hi?  I am guessing that you are aiming at saving a 
> few ulp by trying to represent M_LN_2 in some kind of 1.5 times double 
> precision?

Representing log(2) and hi+lo is standard technique, but is needed in
contexts like this mainly for multiplication by a variable k.  Here I
should probably use a double constant ln2_times_511.  This constant
shouldn't be calculated as 511*ln2 since this might lose accuracy (just
half an ulp?) unnecessarily.  The terms are supposed to be added from
low to high for maximal accuracy, but I didn't check that the
logh(hypot()) term is actually lowest.  I tried replacing this by a
sloppy 511*ln2 and it lost more than half an ulp, so maybe the addition
is doing something good too.

All the -511's are wrong, in different ways.  I'm too tired to get this
right now, but had changed them to -468 (off by 10 error adding 53 to
them).  This seemed to work, but has several more off-by-53 errors.  The
-511's for the boost are wrong magic unrelated to the -511 for the
threshold.

Full code attached.

% ...
% double complex
% clog(double complex z)
% {
% ...
% 	/* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */
% 	if (ax == 1) {
% 		if (ay < 0x1p-511)
% 			return (cpack((ay * 0.5) * ay, atan2(y, x)));
% 		return (cpack(log1p(ay * ay) * 0.5, atan2(y, x)));
% 	}

As before, but moved.

% 
% 	/* Avoid underflow when ax is not small.  Also handle zero args. */
% 	if (ay < DBL_MAX / 0x1p53 && ay * 0x1p53 <= ax)
% 		return (cpack(log(ax), atan2(y, x)));

This is the case of large differences in exponents in hypot().

% 
% 	/* Avoid overflow.  Also handle infinite args. */
% 	if (ax > 0x1p1023)
% 		return (cpack(log(hypot(x * 0.5, y * 0.5)) + ln2_lo + ln2_hi,
% 		    atan2(y, x)));

Use ln2* a lot.  Avoid divisions and M_E.

% 	if (ax > 0x1p511)
% 		return (cpack(log(hypot(x, y)), atan2(y, x)));

Subcase for overflow.  The 0x1p1023 threshold is to avoid hypot()
overflowing in the general expression.  This one is to avoid squares
overflowing, now that we avoid hypot() in the usual case and do our
own squaring.

% 
% 	/* Avoid underflow when ax is small. */
% 	if (ax < 0x1p-468)
% 		return (cpack(log(hypot(x * 0x1p468, y * 0x1p468)) -
% 		    468 * ln2_lo - 468 * ln2_hi, atan2(y, x)));

Buggy -468.  See above.  I tested mainly in float precision where the
corresponding constants are less buggy.

% 
% 	/* Take extra care when the real part of the result is near 0. */
% 	EXTRACT_WORDS(hx, lx, ax);
% 	bits = ((uint64_t)hx << 32) | lx;
% 	bits += 0x04000000;
% 	hx = bits >> 32;
% 	lx = bits & 0xf8000000;
% 	INSERT_WORDS(xh, hx, lx);
% 	xl = ax - xh;
% 	EXTRACT_WORDS(hy, ly, ay);
% 	bits = ((uint64_t)hy << 32) | ly;
% 	bits += 0x04000000;
% 	hy = bits >> 32;
% 	ly = bits & 0xf8000000;
% 	INSERT_WORDS(yh, hy, ly);
% 	yl = ay - yh;

The hi+lo decomposition is painful in double precision.

We fall through to here and use this rather inefficient decomposition in
most cases, to reduce special cases above and make it easier to be more
accurate (just need log*() with more accuracy.  In testing the float case,
I replaced log*f() by log*l() and got perfect rounding from the extra-
precision code.  This helped verify the algorithm.)

% 
% 	xh2 = xh * xh;
% 	yh2 = yh * yh;
% 
% 	h = ax * ax + ay * ay;
% 	using_log1p = 0;
% 	if (h < 1 - 0x1p-5 || h > 1 + 0x1p-4) {

Not very close to 1.

% 		t2 = yh2;
% 		t4 = xl * xl + yl * yl + 2 * (xh * xl + yh * yl);
% 		norm(t2, t4);
% 		spadd(t2, t4, xh2);

Mostly, full multi-precision code isn't useful, so this just adds up
terms from low to high, with extra precision for the final step only
(this extra precision is probably only useful if we subtract 1 later).
This could be optimized a bit by subtracting 1 or 0.5 and 0.5 as in the
more delicate case, but 1 would still need to be subtracted later in
some cases, so I just fall through to that.

% 	} else {

Very close to 1.

% 		if (xh2 >= 0.5 && xh2 <= 2) {
% 			using_log1p = 1;
% 			xh2 -= 1;
% 		} else if (xh2 >= 0.25 && xh2 < 0.5 && yh2 >= 0.25) {
% 			using_log1p = 1;
% 			xh2 -= 0.5;
% 			yh2 -= 0.5;
% 		}
% 		t2 = xl * xl;
% 		t1 = 2 * xh * xl;

Exact so far.

% 		norm(t1, t2);
% 		spadd(t1, t2, xh2);
% 		norm(t1, t2);
% 		t3 = t1;
% 		t4 = t2;

x*x [-1 or -0.5] is in (t3, t4), exactly I hope.

% 
% 		t1 = 2 * yh * yl;
% 		t2 = yl * yl;
% 		norm(t1, t2);
% 		spadd(t1, t2, yh2);
% 		norm(t1, t2);

y*y [-0.5] is in (t1, t2), exactly I hope.

% 
% 		norm(t2, t4);
% 		norm(t1, t3);
% 
% 		spadd(t2, t4, t3);
% 		spadd(t2, t4, t1);

Add up terms.  Not exact in all cases, and not a full Dekkers algorithm
AFAIK.

% 	}
% 	if (!using_log1p && h > 0.5 && h < 3) {
% 		using_log1p = 1;
% 		spadd(t2, t4, -1);

Sloppier addition of -1 for cases not near 1.  Only guaranteed not to
lose accuracy when the infinite precision h is between 0.5 and 2, but
it helps up to about 3.

% 	}
% 	if (using_log1p)
% 		return (cpack(log1p(t2 + t4) * 0.5, atan2(y, x)));
% 	return (cpack(log(t2 + t4) * 0.5, atan2(y, x)));
% }

Subtracting 1 is no good for h far outside of [0.5, 2], so we can't use
log1p() in all cases.  t2+t4 for the log() case is just x*x + y*y
evaluated as accurately as possible (saves an ulp or 2 in the final
result.

% float complex
% clogf(float complex z)
% {
% ...
% 	/* Avoid underflow when ax is not small.  Also handle zero args. */
% 	if (ay < FLT_MAX / 0x1p24F && ay * 0x1p24F <= ax)
% 		return (cpackf(logf(ax), atan2f(y, x)));

Everything is lexically identical with log() if possible.  Magic numbers
are always different.  I should have converted FOO_MAX to magic.

FP constants in float functions must have float type to avoid defeating
half the point of having float functions by evaluating some expressions
in non-float precision.

% 	/* Avoid underflow when ax is small. */
% 	if (ay < 0x1p-39F)
% 		return (cpackf(logf(hypotf(x * 0x1p39F, y * 0x1p39F)) -
% 		    39 * ln2f_lo - 39 * ln2f_hi, atan2f(y, x)));

-39 = -63 + 24 has the same bugs as -458 = -511 + 53 in double precision.

% 
% 	/* Take extra care when the real part of the result is near 0. */
% #if 1

I develop in float precision for easier testing, and duplicate the
algorithm under this part of the ifdef.

% 	GET_FLOAT_WORD(hx, ax);
% 	SET_FLOAT_WORD(xh, hx & 0xfffff000);
% 	xl = ax - xh;
% 	GET_FLOAT_WORD(hy, ay);
% 	SET_FLOAT_WORD(yh, hy & 0xfffff000);
% 	yl = ay - yh;

The hi+lo decomposition is less painful in float precision.

% ...
% #else
% 	dh = (double_t)ax * ax + (double_t)ay * ay;
% 	if (dh > 0.5 && dh < 3) {
% 		dh = (double_t)ax * ax - 1 + (double_t)ay * ay;
% 		return (cpackf(log1pf(dh) * 0.5F, atan2f(y, x)));
% 	}
% 	return (cpackf(logf(dh) * 0.5F, atan2f(y, x)));

More than doubled precision is avaiable in hardware on most supported
arches and is easy to use.  Your version had the -1 in the wrong place.

double_t is used for minor efficiency reasons.  Elswhere, clogf() uses
uses float_t and clog() uses double_t to avoid compiler bugfeatures
(norm() and spadd() don't work without them) and for efficiency.

I made norm() and spadd() type-generic, and removed the STRICT_ASSIGN()'
in them that were to work around the compiler bugs.  It is now the
responsibility of callers to pass them float_t or double_t.

% #endif
% }
% 
% long double complex
% clogl(long double complex z)
% {
% ...
% 	/* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */
% 	if (ax == 1) {
% 		if (ay < 0x1p-8191L)
% 			return (cpackl((ay * 0.5) * ay, atan2l(y, x)));
% 		return (cpackl(log1pl(ay * ay) * 0.5, atan2l(y, x)));
% 	}

Magic numbers for ld80 only.  I can translate this to ld128 in about 10
minutes while I rememeber what the magic numbers are for.  Ifdefing them
would take longer and give a mess.

% 	/* Take extra care when the real part of the result is near 0. */
% 	xh = ax + 0x1p32 - 0x1p32;
% 	xl = ax - xh;
% 	yh = ay + 0x1p32 - 0x1p32;
% 	yl = ay - yh;

The hi+lo decomposition is painless in long double precision, because
ilong double expressions are not evaluated in extra precision.

% 	if (!using_log1p && h > 0.5F && h < 3) {
% 		using_log1p = 1;
% 		spadd(t2, t4, -1);
% 	}

Forgot to translate away the F suffix.  I avoid F and L suffixes wherever
possible, so that no translation or diffs for it are necessary wherever
possible, but there are squllions of 0.5F's.

% 	if (using_log1p)
% 		return (cpackl(log1pl(t2 + t4) * 0.5, atan2l(y, x)));
% 	return (cpackl(logl(t2 + t4) * 0.5, atan2l(y, x)));
% }

Bruce
-------------- next part --------------
#include <complex.h>
#include <float.h>

#include "fpmath.h"
#include "math.h"
#include "math_private.h"

#undef norm
#define	norm(a, b) do {		\
	__typeof(a) __w;	\
				\
	if (fabs(a) < fabs(b)) { \
		__w = (a);	\
		(a) = (b);	\
		(b) = __w;	\
	}			\
	__w = (a) + (b);	\
	(b) = ((a) - __w) + (b); \
	(a) = __w;		\
} while (0)

#undef spadd
#define	spadd(a, b, c) do {	\
	__typeof(a) __tmp;	\
				\
	__tmp = (c);		\
	norm(__tmp, (a));	\
	(b) += (a);		\
	(a) = __tmp;		\
} while (0)

static const double
ln2_hi = 6.9314718055829871e-1,		/*  0x162e42fefa0000.0p-53 */
ln2_lo = 1.6465949582897082e-12;	/*  0x1cf79abc9e3b3a.0p-92 */

double complex
clog(double complex z)
{
	double_t h, t, t1, t2, t3, t4, x, y;
	double_t ax, ay, xh, xh2, xl, yh, yh2, yl;
	uint64_t bits;
	uint32_t hx, hy, lx, ly;
	int using_log1p;

	x = creal(z);
	y = cimag(z);

	/* Handle NaNs using the general formula to mix them right. */
	if (x != x || y != y)
		return (cpack(log(hypot(x, y)), atan2(y, x)));

	ax = fabs(x);
	ay = fabs(y);
	if (ax < ay) {
		t = ax;
		ax = ay;
		ay = t;
	}

	/* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */
	if (ax == 1) {
		if (ay < 0x1p-511)
			return (cpack((ay * 0.5) * ay, atan2(y, x)));
		return (cpack(log1p(ay * ay) * 0.5, atan2(y, x)));
	}

	/* Avoid underflow when ax is not small.  Also handle zero args. */
	if (ay < DBL_MAX / 0x1p53 && ay * 0x1p53 <= ax)
		return (cpack(log(ax), atan2(y, x)));

	/* Avoid overflow.  Also handle infinite args. */
	if (ax > 0x1p1023)
		return (cpack(log(hypot(x * 0.5, y * 0.5)) + ln2_lo + ln2_hi,
		    atan2(y, x)));
	if (ax > 0x1p511)
		return (cpack(log(hypot(x, y)), atan2(y, x)));

	/* Avoid underflow when ax is small. */
	if (ax < 0x1p-468)
		return (cpack(log(hypot(x * 0x1p468, y * 0x1p468)) -
		    468 * ln2_lo - 468 * ln2_hi, atan2(y, x)));

	/* Take extra care when the real part of the result is near 0. */
	EXTRACT_WORDS(hx, lx, ax);
	bits = ((uint64_t)hx << 32) | lx;
	bits += 0x04000000;
	hx = bits >> 32;
	lx = bits & 0xf8000000;
	INSERT_WORDS(xh, hx, lx);
	xl = ax - xh;
	EXTRACT_WORDS(hy, ly, ay);
	bits = ((uint64_t)hy << 32) | ly;
	bits += 0x04000000;
	hy = bits >> 32;
	ly = bits & 0xf8000000;
	INSERT_WORDS(yh, hy, ly);
	yl = ay - yh;

	xh2 = xh * xh;
	yh2 = yh * yh;

	h = ax * ax + ay * ay;
	using_log1p = 0;
	if (h < 1 - 0x1p-5 || h > 1 + 0x1p-4) {
		t2 = yh2;
		t4 = xl * xl + yl * yl + 2 * (xh * xl + yh * yl);
		norm(t2, t4);
		spadd(t2, t4, xh2);
	} else {
		if (xh2 >= 0.5 && xh2 <= 2) {
			using_log1p = 1;
			xh2 -= 1;
		} else if (xh2 >= 0.25 && xh2 < 0.5 && yh2 >= 0.25) {
			using_log1p = 1;
			xh2 -= 0.5;
			yh2 -= 0.5;
		}
		t2 = xl * xl;
		t1 = 2 * xh * xl;
		norm(t1, t2);
		spadd(t1, t2, xh2);
		norm(t1, t2);
		t3 = t1;
		t4 = t2;

		t1 = 2 * yh * yl;
		t2 = yl * yl;
		norm(t1, t2);
		spadd(t1, t2, yh2);
		norm(t1, t2);

		norm(t2, t4);
		norm(t1, t3);

		spadd(t2, t4, t3);
		spadd(t2, t4, t1);
	}
	if (!using_log1p && h > 0.5 && h < 3) {
		using_log1p = 1;
		spadd(t2, t4, -1);
	}
	if (using_log1p)
		return (cpack(log1p(t2 + t4) * 0.5, atan2(y, x)));
	return (cpack(log(t2 + t4) * 0.5, atan2(y, x)));
}

static const float
ln2f_hi =  6.9314575195e-1,		/*  0xb17200.0p-24 */
ln2f_lo =  1.4286067653e-6;		/*  0xbfbe8e.0p-43 */

float complex
clogf(float complex z)
{
	double_t dh;
	float_t h, t, t1, t2, t3, t4, x, y;
	float_t ax, ay, xh, xh2, xl, yh, yh2, yl;
	uint32_t hx, hy;
	int using_log1p;

	x = crealf(z);
	y = cimagf(z);

	/* Handle NaNs using the general formula to mix them right. */
	if (x != x || y != y)
		return (cpackf(logf(hypotf(x, y)), atan2f(y, x)));

	ax = fabsf(x);
	ay = fabsf(y);
	if (ax < ay) {
		t = ax;
		ax = ay;
		ay = t;
	}

	/* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */
	if (ax == 1) {
		if (ay < 0x1p-63F)
			return (cpackf((ay * 0.5F) * ay, atan2f(y, x)));
		return (cpackf(log1pf(ay * ay) * 0.5F, atan2f(y, x)));
	}

	/* Avoid underflow when ax is not small.  Also handle zero args. */
	if (ay < FLT_MAX / 0x1p24F && ay * 0x1p24F <= ax)
		return (cpackf(logf(ax), atan2f(y, x)));

	/* Avoid overflow.  Also handle infinite args. */
	if (ax > 0x1p127F)
		return (cpackf(
		    logf(hypotf(x * 0.5F, y * 0.5F)) + ln2f_lo + ln2f_hi,
		    atan2f(y, x)));
	if (ax > 0x1p63F)
		return (cpackf(logf(hypotf(x, y)), atan2f(y, x)));

	/* Avoid underflow when ax is small. */
	if (ay < 0x1p-39F)
		return (cpackf(logf(hypotf(x * 0x1p39F, y * 0x1p39F)) -
		    39 * ln2f_lo - 39 * ln2f_hi, atan2f(y, x)));

	/* Take extra care when the real part of the result is near 0. */
#if 1
	GET_FLOAT_WORD(hx, ax);
	SET_FLOAT_WORD(xh, hx & 0xfffff000);
	xl = ax - xh;
	GET_FLOAT_WORD(hy, ay);
	SET_FLOAT_WORD(yh, hy & 0xfffff000);
	yl = ay - yh;

	xh2 = xh * xh;
	yh2 = yh * yh;

	h = ax * ax + ay * ay;
	using_log1p = 0;
	if (h < (float)(1 - 0x1p-5) || h > (float)(1 + 0x1p-4)) {
		t2 = yh2;
		t4 = xl * xl + yl * yl + 2 * (xh * xl + yh * yl);
		norm(t2, t4);
		spadd(t2, t4, xh2);
	} else {
		if (xh2 >= 0.5F && xh2 <= 2) {
			using_log1p = 1;
			xh2 -= 1;
		} else if (xh2 >= 0.25F && xh2 < 0.5F && yh2 >= 0.25F) {
			using_log1p = 1;
			xh2 -= 0.5F;
			yh2 -= 0.5F;
		}
		t2 = xl * xl;
		t1 = 2 * xh * xl;
		norm(t1, t2);
		spadd(t1, t2, xh2);
		norm(t1, t2);
		t3 = t1;
		t4 = t2;

		t1 = 2 * yh * yl;
		t2 = yl * yl;
		norm(t1, t2);
		spadd(t1, t2, yh2);
		norm(t1, t2);

		norm(t2, t4);
		norm(t1, t3);

		spadd(t2, t4, t3);
		spadd(t2, t4, t1);
	}
	if (!using_log1p && h > 0.5F && h < 3) {
		using_log1p = 1;
		spadd(t2, t4, -1);
	}
	if (using_log1p)
		return (cpackf(log1pf(t2 + t4) * 0.5F, atan2f(y, x)));
	return (cpackf(logf(t2 + t4) * 0.5F, atan2f(y, x)));
#else
	dh = (double_t)ax * ax + (double_t)ay * ay;
	if (dh > 0.5 && dh < 3) {
		dh = (double_t)ax * ax - 1 + (double_t)ay * ay;
		return (cpackf(log1pf(dh) * 0.5F, atan2f(y, x)));
	}
	return (cpackf(logf(dh) * 0.5F, atan2f(y, x)));
#endif
}

long double complex
clogl(long double complex z)
{
	long double h, t, t1, t2, t3, t4, x, y;
	long double ax, ay, xh, xh2, xl, yh, yh2, yl;
	int using_log1p;

	x = creall(z);
	y = cimagl(z);

	/* Handle NaNs using the general formula to mix them right. */
	if (x != x || y != y)
		return (cpackl(logl(hypotl(x, y)), atan2l(y, x)));

	ax = fabsl(x);
	ay = fabsl(y);
	if (ax < ay) {
		t = ax;
		ax = ay;
		ay = t;
	}

	/* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */
	if (ax == 1) {
		if (ay < 0x1p-8191L)
			return (cpackl((ay * 0.5) * ay, atan2l(y, x)));
		return (cpackl(log1pl(ay * ay) * 0.5, atan2l(y, x)));
	}

	/* Avoid underflow when ax is not small.  Also handle zero args. */
	if (ay < LDBL_MAX / 0x1p64 && ay * 0x1p64 <= ax)
		return (cpackl(logl(ax), atan2l(y, x)));

	/* Avoid overflow.  Also handle infinite args. */
	if (ax > 0x1p16383L)
		return (cpackl(logl(hypotl(x * 0.5, y * 0.5)) + ln2_lo + ln2_hi,
		    atan2l(y, x)));
	if (ax > 0x1p8191)
		return (cpackl(logl(hypotl(x, y)), atan2l(y, x)));

	/* Avoid underflow when ax is small. */
	if (ax < 0x1p-8127L)
		return (cpackl(logl(hypotl(x * 0x1p8127L, y * 0x1p8127L)) -
		    8127 * ln2_lo - 8127 * ln2_hi, atan2l(y, x)));

	/* Take extra care when the real part of the result is near 0. */
	xh = ax + 0x1p32 - 0x1p32;
	xl = ax - xh;
	yh = ay + 0x1p32 - 0x1p32;
	yl = ay - yh;

	xh2 = xh * xh;
	yh2 = yh * yh;

	h = ax * ax + ay * ay;
	using_log1p = 0;
	if (h < 1 - 0x1p-5 || h > 1 + 0x1p-4) {
		t2 = yh2;
		t4 = xl * xl + yl * yl + 2 * (xh * xl + yh * yl);
		norm(t2, t4);
		spadd(t2, t4, xh2);
	} else {
		if (xh2 >= 0.5 && xh2 <= 2) {
			using_log1p = 1;
			xh2 -= 1;
		} else if (xh2 >= 0.25 && xh2 < 0.5 && yh2 >= 0.25) {
			using_log1p = 1;
			xh2 -= 0.5;
			yh2 -= 0.5;
		}
		t2 = xl * xl;
		t1 = 2 * xh * xl;
		norm(t1, t2);
		spadd(t1, t2, xh2);
		norm(t1, t2);
		t3 = t1;
		t4 = t2;

		t1 = 2 * yh * yl;
		t2 = yl * yl;
		norm(t1, t2);
		spadd(t1, t2, yh2);
		norm(t1, t2);

		norm(t2, t4);
		norm(t1, t3);

		spadd(t2, t4, t3);
		spadd(t2, t4, t1);
	}
	if (!using_log1p && h > 0.5F && h < 3) {
		using_log1p = 1;
		spadd(t2, t4, -1);
	}
	if (using_log1p)
		return (cpackl(log1pl(t2 + t4) * 0.5, atan2l(y, x)));
	return (cpackl(logl(t2 + t4) * 0.5, atan2l(y, x)));
}