svn commit: r323329 - head/sys/sys
David Chisnall
theraven at FreeBSD.org
Sat Sep 9 07:14:44 UTC 2017
On 8 Sep 2017, at 21:09, Mateusz Guzik <mjg at FreeBSD.org> wrote:
>
> Author: mjg
> Date: Fri Sep 8 20:09:14 2017
> New Revision: 323329
> URL: https://svnweb.freebsd.org/changeset/base/323329
>
> Log:
> Allow __builtin_memset instead of bzero for small buffers of known size
This change seems redundant, because modern compilers already do this optimisation. For example:
#include <strings.h>
char buf[42];
void bz(void)
{
bzero(buf, 42);
}
With clang 4.0 on x86 compiles to:
pushq %rbp
movq %rsp, %rbp
xorps %xmm0, %xmm0
movups %xmm0, buf+26(%rip)
movaps %xmm0, buf+16(%rip)
movaps %xmm0, buf(%rip)
popq %rbp
retq
On AArch64, it compiles to:
adrp x8, buf
add x8, x8, :lo12:buf
strh wzr, [x8, #40]
stp xzr, xzr, [x8, #24]
stp xzr, xzr, [x8, #8]
str xzr, [x8]
ret
Neither contains a call, both have inlined the zeroing. This change is strictly worse, because the compiler has some carefully tuned heuristics that are set per target for when to inline the memset / bzero and when to call the function. These are based on both the size and the alignment, including whether the target supports misaligned accesses and whether misaligned accesses are cheap. None of this is captured by this change.
In the kernel, this optimisation is disabled by -ffreestanding, however __builtin_memset will be turned into a memset call if the size is not constant or if the memset call would be more efficient (as determined by the aforementioned heuristics). Simply using __builtin_memset in all cases should give better code, and is more likely to be forward compatible with future ISAs where the arbitrary constant picked in this patch may or may not be optimal.
David
More information about the svn-src-head
mailing list