libc assembly optimizations?

James Van Artsdalen james-freebsd-amd64 at
Tue Dec 30 02:16:32 PST 2003

Here's an alternative for fabs (3):

	psllq	$1,%xmm0	/* 64-bit shifts lefts */
	psrlq	$1,%xmm0	/* logical shift right clears sign */

/usr/src/lib/libc/amd64/gen/fabs.S does the code below.
gcc generates essentially the same code as below.
The shifts above seem to work and look better to me.

The string ops can made be significantly improved if allowed to
read extra bytes around the string but within the same 16-byte
paragraph as the start or end of the string.  This seems safe in

Finally, can the SSE2 regs be safely used in kernel mode?
Page fill and aligned-bulk bcopy calls can be improved this way.

 * Ok, this sucks. Is there really no way to push an xmm register onto
 * the FP stack directly?

	movsd	%xmm0, -8(%rsp)
	fldl	-8(%rsp)
	fstpl	-8(%rsp)
	movsd	-8(%rsp),%xmm0

More information about the freebsd-amd64 mailing list