Update ENTERI() macro

Wed Feb 27 21:31:39 UTC 2019

On Thu, Feb 28, 2019 at 07:15:14AM +1100, Bruce Evans wrote:
> On Wed, 27 Feb 2019, Steve Kargl wrote:
> 
> > On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote:
> >>
> >> ENTERI() hard-codes the long double for simplicity.  Remember, it is only
> >> needed for long double precision on i386.  But I forgot about long double
> >> complex types, and didn't dream about indirect long double types in sincosl().
> >
> > That simplicity does not work for long double complex.  We will
> > 
> > need either ENTERIC as in
> >
> > #define ENTERIC() ENTERIT(long double complex)
> >
> > or a direct use of ENTERIT as you have done s_clogl.c
> 
> I wrote ENTERIT() to work around this problem.
> 
> >>> I'm fine with making ENTERI() only toggle precision, and adding
> >>> a LEAVEI() to reset precision.  RETURNI(r) would then be
> >>>
> >>> #define RETURNI(r)	\
> >>> do {		\
> >>>   LEAVEI();		\
> >>>   return (r);	\
> >>> } while (0)
> >>
> >> No, may be an expression, so it must be evaluated before LEAVEI().  This
> >> is the reason for existence of the variable to hold the result.
> >
> > So, we'll need RETURNI for long double and one for long double complex.
> > Or, we give RETURNI a second parameter, which is the input parameter of
> > the function
> 
> I said to use your method of __typeof().  I tested this:

Oh, I misunderstood.  I thought you meant that __typeof()
may have portability issues, so you want to avoid it.

> XX --- /tmp/math_private.h	Sun Nov 27 17:58:57 2005
> XX +++ ./math_private.h	Thu Feb 28 06:17:26 2019

(patched deleted to keep this short)

> This compiles, but has minor problems.  Note that the apparent style
> bug of initializing __retval in its declaration is needed in cases
> where __typeof() gives a const type.  This happens in my code that
> uses RETURNI(1 + tiny) to set inexact.  I think it would also happen
> for RETURNI(1).  The type is then int instead of floating point, and
> I need to check that this is harmless.

I've not used __typeof() until now.  Coming with my modern Fortran
background, doing __typeof(expr) __retval = (expr) feel weirdly
circular.  In Fortran, the RHS of an expression is evaluate independently
of the type of the LHS.  It is only at the actual assignment that
type conversion (if needed) occurs.

> clogl() is the only user of ENTERIT().  Its size expands from 2302
> bytes text to 2399 when compiled by gcc-3.3.3.  I hope that this is
> just gcc not doing a very good job optimizing the returns (there are
> many RETURNI()s fpr clogl()).  Repeating the return code instead of
> jumping to it might even be optimal.
> 
> > #define RETURNI(x, r)	\
> > do {			\
> >   x = (r)		\
> >   LEAVEI();		\
> >   return (r);		\
> > } while (0)
> >
> > This will cause a lot of churn.
> 
> Indeed.
> 
> My version causes 1 line of churn:
> 
> XX --- /tmp/s_clogl.c	Fri Jul 20 16:00:11 2018
> XX +++ ./s_clogl.c	Thu Feb 28 05:58:05 2019
> XX @@ -66,5 +66,5 @@
> XX  	int kx, ky;
> XX 
> XX -	ENTERIT(long double complex);
> XX +	ENTERI();
> XX 
> XX  	x = creall(z);
> 
> >> Combined sin and cos probably does work better outside of benchmarks for
> >> sin and cos alone, since it does less work so leaves more resources for
> >> the, more useful things.
> >
> > Exactly!  I have a significant amount of Fortran code that does
> >
> >   z = cmplx(cos(x), sin(x))
> >
> > in modern C this is 'z = CMPLX(cos(x), sin(x))'.  GCC with optimization
> > enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp
> > to optimize this to sincos().
> 
> This is an pessimization unless everything is inlined.  An optimization
> would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x).
> 
> > GCC on FreeBSD will not do this optimization
> > because FreeBSD's libm is not C99 compliant.
> 
> It is more conformant than most for cexp().  I think old gcc just doesn't
> attempt such optimizations.
> 
> > When I worked on sincos() I tried a few variations.  This included
> > the simpliest implementation:
> >
> > void
> > sincos(double x, double *s, double *c)
> > {
> >  *c = cos(x);
> >  *s=  sin(x);
> > }
> >
> > I tried argument reduction with kernels.
> >
> > void
> > sincos(double x, double *s, double *c)
> > {
> >  a = inline argument reduction done to set a.
> >  *c = k_cos(x);
> >  *s=  k_sin(x);
> > }
> 
> You mean *c = s_cos(x), etc.  That was good enough.
> 
> > And finally the version that was committed where k_cos and k_sin
> > were manually inlined and re-arranged to reduce redundant computations.
> 
> That has excessive manual inlining.  It should have only inlined s_cos()
> and s_sin(), and changed k_cos() and k_sin() from extern to static inline.
> Someday the data for these inline functions should be deduplicated, but
> the data is small compared with that for the expl kernel.
> 

I recall you preferred the "good enough" implementation,
because it would reduce maintenance if either sin or cos
kernel needed to change.  Now, one needs to remember that
there is a sincos kernel.  I also remember that you felt
that a good compiler could optimize the inlined sin and
cos kernels.  I suppose FreeBSD didn't have good compilers
when I manually inlined the sincos kernel, because it was
a significant measurable improvement.

-- 
Steve