svn commit: r305382 - in head/lib/msun: amd64 i387

Konstantin Belousov kostikbel at gmail.com
Sun Sep 4 14:49:07 UTC 2016


On Sun, Sep 04, 2016 at 12:22:14PM +0000, Bruce Evans wrote:
> Author: bde
> Date: Sun Sep  4 12:22:14 2016
> New Revision: 305382
> URL: https://svnweb.freebsd.org/changeset/base/305382
> 

> Log:
>   Add asm versions of fmod(), fmodf() and fmodl() on amd64.  Add asm
>   versions of fmodf() amd fmodl() on i387.
>   
>   fmod is similar to remainder, and the C versions are 3 to 9 times
>   slower than the asm versions on x86 for both, but we had the strange
>   mixture of all 6 variants of remainder in asm and only 1 of 6
>   variants of fmod in asm.
> 
> Added:
>   head/lib/msun/amd64/e_fmod.S   (contents, props changed)
>   head/lib/msun/amd64/e_fmodf.S   (contents, props changed)
>   head/lib/msun/amd64/e_fmodl.S   (contents, props changed)
>   head/lib/msun/i387/e_fmodf.S   (contents, props changed)
It seems that wrong version of i387/f_fmodf.S, it is identical to the
amd64 version.

> Added: head/lib/msun/amd64/e_fmod.S
> ==============================================================================
> --- /dev/null	00:00:00 1970	(empty, because file is newly added)
> +++ head/lib/msun/amd64/e_fmod.S	Sun Sep  4 12:22:14 2016	(r305382)
> +ENTRY(fmod)
> +	movsd	%xmm0,-8(%rsp)
> +	movsd	%xmm1,-16(%rsp)
> +	fldl	-16(%rsp)
> +	fldl	-8(%rsp)
> +1:	fprem
> +	fstsw	%ax
> +	testw	$0x400,%ax
> +	jne	1b
> +	fstpl	-8(%rsp)
> +	movsd	-8(%rsp),%xmm0
> +	fstp	%st
> +	ret
> +END(fmod)

I see that this is not a new approach in the amd64 subdirectory, to use
x87 FPU on amd64.  Please note that it might have non-obvious effects on
the performance, in particular, on the speed of the context switches and
handling of #NM exception.

Newer Intel and possibly AMD CPUs have an optimization which allows
coprocessor code to save and restore state to not save and restore state
which was not changed.  In other words, for typical amd64 binary which
uses %xmm register file but did not touched %st nor %ymm, only %xmm
bits are spilled and then loaded.  Touching %st defeats the optimization,
possible for the whole lifetime of the thread.

This feature (XSAVEOPT) is available at least starting from Haswell
microarchitecture, not sure about IvyBridge.


More information about the svn-src-all mailing list