Implementation of half-cycle trignometric functions

Sat May 20 14:58:15 UTC 2017

On Sat, May 20, 2017 at 10:58:03AM +1000, Bruce Evans wrote:
> On Fri, 19 May 2017, Bruce Evans wrote:
> 
> XX -	__w = (a) + (b);	\
> XX +	STRICT_ASSIGN(__typeof(a), __w, (a) + (b)); \
> XX  	(b) = ((a) - __w) + (b); \
> XX  	(a) = __w;		\
> 
> Other fixes: my tests only passed because they were on i386 with extra
> precision for doubles.  On amd64 and on i386 with the default precision
> for doubles, the problem with the hi+lo decomposition for denormals
> showed up as errors of about 1.5 ulps for sinpif(), tanpif(), sinpi()
> and tanpi().  Previously-discussed fixes for this worked.  I rewrote
> them from scratch (not very cleanly -- they should be in a header file,
> as should be the kernels.  1 header file can keep kernels and functions
> to multiply by pi for all precisions):

After you mentioned denormals in another email, I tested at sinpif/cospif.
I don't remember the magnitude of the ULP observed, but it was much larger
thanm I expected.  I added scaling similar to what you have done but used
a different constant. 

> XX diff -u2 s_sinpi.c~ s_sinpi.c
> XX --- s_sinpi.c~	Fri May 19 09:05:16 2017
> XX +++ s_sinpi.c	Fri May 19 19:52:07 2017
> XX @@ -77,6 +77,5 @@
> XX  sinpi(double x)
> XX  {
> XX -	volatile static const double vzero = 0;
> XX -	double ax, s;
> XX +	double ax, hi, lo, s;

I also introduced hi and lo as you have done.  If you initially
missed my overloading use of ax and x, I assumed others would
as well.

> XX  	uint32_t hx, ix, j0, lx;
> XX 
> XX @@ -87,12 +86,15 @@
> XX  	if (ix < 0x3ff00000) {			/* |x| < 1 */
> XX  		if (ix < 0x3fd00000) {		/* |x| < 0.25 */
> XX -			if (ix < 0x3e200000) {	/* |x| < 0x1p-29 */
> XX +			if (ix < 0x3de00000) {	/* |x| < 2**-33 */
> XX  				if (x == 0)
> XX  					return (x);
> XX -				INSERT_WORDS(ax, hx, 0);
> XX -				x -= ax;
> XX -				s = pilo * x + pihi * x + pilo * ax
> XX -				    + pihi * ax;
> XX -				return (s);
> XX +				if ((int)x == 1) /* gen. inexact if x != 0 */
> XX +					return (x);	/* NOTREACHED */

Is this simply to guarantee that inexact is raised?  I had
assumed that inexact would be raised by one or more if the
* and + operation in s.

> Multiplication by pi is remarkably complicated.  I noticed that you
> already avoided the trap of using the usual method of truncating x
> by converting to float -- this fails for denormals and other small
> values since floats don't have enough range.  Converting long doubles
> fails similarly.

I avoided the trap in some cases, but use cast elsewhere.
I'll need to check if those are ok.

> So 2 different methods for multiplying by pi are needed:
> - a slower general method that works for small values (use bit-fiddling
>      to truncate), and scale up
> - the usual method of casting to float for medium-sized values
> - mercifully, a third method that works on large values is not needed.
> These should be in a header file and not duplicated.  But I used the
> quick fix of duplicating the above for tanpi().
> 
> XX diff -u2 s_sinpif.c~ s_sinpif.c
> XX --- s_sinpif.c~	Fri May 19 09:05:16 2017
> XX +++ s_sinpif.c	Fri May 19 19:49:33 2017
> XX @@ -46,5 +46,4 @@
> XX  sinpif(float x)
> XX  {
> XX -	volatile static const float vzero = 0;
> XX  	float ax, s;
> XX  	uint32_t hx, ix, j0;
> XX @@ -56,14 +55,7 @@
> XX  	if (ix < 0x3f800000) {			/* |x| < 1 */
> XX  		if (ix < 0x3e800000) {		/* |x| < 0.25 */
> XX -	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
> XX -				if (x == 0)
> XX -					return (x);
> XX -				SET_FLOAT_WORD(ax, hx & 0xffff0000);
> XX -				x -= ax;
> XX -				s = pilo * x + pihi * x + pilo * ax
> XX -				    + pihi * ax;
> XX -				return (s);
> XX -			}
> XX -
> XX +	 		if (ix < 0x34800000)	/* |x| < 2**-22 */
> XX +				if ((int)x == 0) /* gen. inexact if x != 0 */
> XX +					return (M_PI * x);
> XX  			s = __kernel_sinpif(ax);
> XX  			return ((hx & 0x80000000) ? -s : s);
> 
> This is simpler because it can just use double precision.

It defeats the use of the same algorithm in all precisions.
In particular, it is easier to develop an algorithm in float,
test it, and then generalize to other precisions.

> Also fix missing
> setting of inexact and 1 style bug.  This can use essentially the same code
> as sinf().  It doesn't need a special case for x == 0 since multiplication
> if +-0 by M_PI preserves the sign.  I didn't fix the style bug and pessimal
> sign handling for non-small x.  This should be the same as for sinf() too.
> The kernel preserves oddness in the default rounding mode except for x == 0
> which is not handled by the kernel.
> 
> XX static inline double
> XX __kernel_mpi(double x)
> XX {
> XX 	double hi, lo;
> XX 	int32_t hx, lx;
> XX 
> XX 	EXTRACT_WORDS(hx, lx, x);
> XX 	hi = x;
> XX 	SET_LOW_WORD(hi, 0);
> XX 	hi *= 0x1p1000;
> XX 	lo = 0x1p1000 * x  - hi;
> XX 	return ((0x1p1000 * pi_lo * x + pi_hi * lo + pi_hi * hi) * 0x1p-1000);
> XX }
> 
> This is cleaner, but not yet suitable for a kernel in a header file since
> it does a pessimal second extraction (which should be only of the high word
> again).

Why the 2nd extraction?  Neither hx nor lx are used.  Also,
int32_t should be uint32_t (or to be in line with math_private.h
perhaps u_int32_t).

> Since this is inline, perhaps the compile can optimize away the
> second extraction.  It can do this easily only if the extractions are the
> same.  If the compiler can't do this (perhaps because this isn't inlined),
> then hx (and lx if available) should be passed to the kernel.

Why?  Neither is used in the kernel.

> The above isn't a general multiplication kernel as originally intended,
> since it has to scale up to handle small values, so doesn't work for
> large values.  Actually, it would work up to 0x1p53, but be slower.
> 
> The amount of scaling up is tricky to determine.  0x1p1000 is about
> half-way in the range of ~2048 exponents, to scale up denormals to
> nearly 1 so that they obviously aren't denormals.
> 
> XX 
> XX double
> XX ysinpi(double x)

I'll need to spend part of today studying this function.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow