SSE in libthr

Sat Mar 28 16:24:29 UTC 2015

On 28 Mar 2015, at 13:54, Julian Elischer <julian at freebsd.org> wrote:
> 
> the point is that clang will do this anywhere it can, because it isn't taking into account the
> side effects, just the speed of the commands themselves.

This is also something that is not going to decrease.  Clang now enables the SLP vectoriser by default and this code is constantly being improved.  Current generation vector units are explicitly designed as targets for compiler autovectorisation, not for hand-tuned DSP code (which, increasingly, runs on the GPU anyway).  This means that we're increasingly going to see SSE/AVX/NEON usage in CPU-bound code, even without an explicit programmer decision to do so.  Optimising for the case when the vector unit is not used is about as sensible as optimising for the single-core case: it will affect some people, but generally not those who care about performance, and a decreasing number of people over time.

David