SSE in libthr

Jilles Tjoelker jilles at
Fri Mar 27 21:41:00 UTC 2015

On Fri, Mar 27, 2015 at 03:26:17PM -0400, Eric van Gyzen wrote:
> In a nutshell:

> Clang emits SSE instructions on amd64 in the common path of
> pthread_mutex_unlock.  This reduces performance by a non-trivial
> amount.  I'd like to disable SSE in libthr.

How about saving and restoring the FPU/SSE state eagerly instead of the
current CR0.TS-based lazy method? There is overhead associated with #NM
exception handling (fpudna) which is not worth it if FPU/SSE are used
often. This would apply to userland threads only; kernel threads
normally do not use FPU/SSE and handle the FPU/SSE state manually if
they do.

There is performance improvement potential in using SSE for optimizing
string functions, for example. Even a simple SSE2 strlen easily
outperforms the already optimized lib/libc/string/strlen.c in a
microbenchmark, and many other string functions are slow byte-at-a-time

Jilles Tjoelker

More information about the freebsd-current mailing list