cvs commit: src/sys/i386/include _types.h

Fri Mar 7 04:26:40 UTC 2008

On Wed, 5 Mar 2008, Colin Percival wrote:

> Bruce Evans wrote:
>> On Wed, 5 Mar 2008, Colin Percival wrote:
>>> Bruce Evans wrote:
>>>>   Change float_t and double_t to long double on i386.
>>>
>>> Doesn't this have a rather severe performance impact on any code which
>>> uses double_t?
>>
>> No.  As mentioned in the commit message, this has no performance effect
>> except in cases where it avoids compiler bugs.  [...] if you use long double
>> for memory variables then you get a severe performance impact and some
>> space loss for the load instruction, since loading long doubles is
>> much slower than loading doubles (about 4 times slower on Athlons).
>
> Either I'm misunderstanding something, or you seem to be disagreeing with
> yourself here... if I have the following code
>
> double_t foo, bar, foobar;
> foobar = foo + bar;
>
> then prior to this change the processor loads and stores doubles, while
> after this change the processor loads and stores long doubles, with the
> associated performance penalty.

Low quality code might do that :-).

Hmm, I thought that these types were intended to be used to avoid loss
of precision in intermediate values, and thus should only be used in
code that wants the extra precision at a possible cost in efficiency,
but C99 (n869.txt draft) doesn't say exactly this, and a footnote
(which is not part of the standard) says that they are trhe most
efficient types that are wider than the basic types.  It specifies
them fully if FLT_EVAL_METHOD is 0, 1 or 2, but weird machines like
i386 with 53-bit default precision plus compiler bugs must define
FLT_EVAL_METHOD as -1.  Then these types are implementation-defined
and are not specifically required to be wider than the evaluation
method types.  I think they should be specifically required to be the
most efficient types wider than the evaluation method types.

Back to performance penalties...  It is extremely unclear what are the
most efficient types.  Your example is too simple for there to be any
stores if double_t is long double:

> double_t foo, bar, foobar;
> foobar = foo + bar;

o Efficiency of double_t for foo and bar depends on where they came from.
   If they are long doubles in memory initially here, then loading them is
   slow.  If they are in registers initially here, then there is no load
   to worry about and their type doesn't matter from now on, but their
   type may have affected previous loads and stores like subsequent use of
   foobar will (see below).
o foo + bar is evaluated in registers (except i386 also allows adds from
   a value in memory to a value in a register if the memory type is not
   long double).  It essentially has type long double, perhaps rounded to
   53-bit or 24-bit precision.
o Assignment to foobar requires a conversion from long double to double_t.
   If double_t is long double, then this conversion is null and foobar
   normally stays in the register.  Otherwise, the conversion is not null,
   but it won't be done here since all the variables have type double_t
   and the compiler doesn't really know the type of result of the addition
   (it thinks that double + double gives double).  A C compiler would know
   that the result is long double and always do the conversion if double_t
   is only double.
o Here is an example of why a conversion is necessary on i386 unless all
   types are long double:

     double x = DBL_MAX, y = DBL_MAX, z;
     z = DBL_MAX + DBL_MAX;  // should be +Inf/FE_OVERFLOW but is 2.0L*DBL_MAX
     z = z - DBL_MAX;        // should be +Inf/FE_OVERFLOW but is 1.0L*DBL_MAX

   The extra range of a long double gives this.  53-bit precision doesn't
   affect it, and a conversion on assignment is still strictly needed to
   give the correct results for the intermediate and final z.

   z = DBL_MAX + DBL_MAX - DBL_MAX;

   Now both DBL_MAX and +Inf/FE_OVERFLOW are correct results, depending on
   where double expressions are evaluated with extra range.

o Now we have a double_t foobar, reloaded into a register if that is useful,
   and can probably use it as either a double or a long double at no
   extra cost, but if double_t is double and the compiler is a C compiler,
   then converting foobar to a double always cost a lot and usually
   wasted its extra range if not its extra precision.  We should probably
   have used double_t throughout to avoid this...
o .. however, if we use double_t too much, then it will be used for memory
   variables and the loads and stores for these will cost more than
   conversions to double and reloads.
o However2, someday when 64-bit precision is supported, using double_t will
   be especially important for memory variables, to prevent unintended loss
   of precision.

Bruce