Complex arg-trig functions
Stephen Montgomery-Smith
stephen at missouri.edu
Sun Sep 16 20:53:45 UTC 2012
On 09/16/2012 03:29 PM, Bruce Evans wrote:
> On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote:
>
>> On 09/16/2012 11:51 AM, Bruce Evans wrote:
>>>
>>> I don't like that. It will be much slower on almost 1/4 of arg space.
>>> The only reason to consider not doing it is that the args that it
>>> applies to are not very likely, and optimizing for them may pessimize
>>> the usual case.
>>
>> The pessimization when |z| is not small is tiny. It takes no time at
>> all to check that |z| is small.
>
> Not necessarily on out-of-order machines (most x86). The CPU executes
> multiple paths speculatively and concurrently. If it does more on an
> unused path, then it might do less on the used path. It may mispredict
> the branch on the size of |z| and thus misguess which path to do more
> on. (I don't know many details of this. For example, does it do
> anything at all on paths predicted to be not taken?) Losses from this
> are usually described as branch mispredictions. They might cost 20
> (50? 100?) cycles after taking 2 about cycles to actually check |z|
> (2 cycles pipelined but more like <length of pipe> + 8 in real time,
> and it is the latter time that you lose by backing out).
>
> The only sure way to avoid branch mispredictions is to not have any,
> and catrig is too complicated for that.
Yes, but I did a time test. And in my case the test was almost always
failing.
>
>> On the other hand let me go through the code and see what happens when
>> |x| is small or |y| is small. There are actually specific formulas
>> that work well in these two cases, and they are probably not that much
>> slower than the formulas I decided to remove. And when you chase
>> through all the logic and "if" statements, you may find that you
>> didn't use up a whole bunch of time for these very special cases of
>> |z| small - most of the extra time merely being the decisions invoked
>> by the "if" statements.
>
> But all general cases end up going through an extern function like
> acos() or atan2(), and just calling another function is a significant
> overhead. When |z| is small, the arg(s) to the other function will
> probably be an special case for it (e.g., acos(small)). The other
> function should optimize this and not take as long as an average call.
> However, since it is special, it may cause branch mispredictions for
> other uses of the function.
I understand what you are saying. I guess it just seems to me that the
"proper" way to do it is to make the C compiler really awesome and do
this for you. (Doesn't the Intel compiler try to embed functions inline
if it knows it will speed things up)?
>> Furthermore, casinh etc are not commonly used functions. Putting huge
>> amounts of effort looking at special cases to speed it up a little
>> somehow feels wrong to me. In fact, if the programmer knows that he
>> will be wanting casinh, and evaluated very fast, then he should be
>> motivated enough to try out using z in the case when |z| is small, and
>> see if that really speeds things up.
Well, if casinh goes 20% slower, your not going to be testing too many
fewer cases.
> True. Now I mainly want it to be fast so that I can test more cases.
I understand. But putting those special cases into casinh offends my
sense of taste.
More information about the freebsd-numerics
mailing list