svn commit: r323329 - head/sys/sys

Sat Sep 9 07:14:44 UTC 2017

On 8 Sep 2017, at 21:09, Mateusz Guzik <mjg at FreeBSD.org> wrote:
> 
> Author: mjg
> Date: Fri Sep  8 20:09:14 2017
> New Revision: 323329
> URL: https://svnweb.freebsd.org/changeset/base/323329
> 
> Log:
>  Allow __builtin_memset instead of bzero for small buffers of known size

This change seems redundant, because modern compilers already do this optimisation.  For example:

	#include <strings.h>

	char buf[42];

	void bz(void)
	{
        	bzero(buf, 42);
	}

With clang 4.0 on x86 compiles to:

        pushq   %rbp
        movq    %rsp, %rbp
        xorps   %xmm0, %xmm0
        movups  %xmm0, buf+26(%rip)
        movaps  %xmm0, buf+16(%rip)
        movaps  %xmm0, buf(%rip)
        popq    %rbp
        retq

On AArch64, it compiles to:

        adrp    x8, buf
        add     x8, x8, :lo12:buf
        strh    wzr, [x8, #40]
        stp     xzr, xzr, [x8, #24]
        stp     xzr, xzr, [x8, #8]
        str             xzr, [x8]
        ret

Neither contains a call, both have inlined the zeroing.  This change is strictly worse, because the compiler has some carefully tuned heuristics that are set per target for when to inline the memset / bzero and when to call the function.  These are based on both the size and the alignment, including whether the target supports misaligned accesses and whether misaligned accesses are cheap.  None of this is captured by this change.

In the kernel, this optimisation is disabled by -ffreestanding, however __builtin_memset will be turned into a memset call if the size is not constant or if the memset call would be more efficient (as determined by the aforementioned heuristics).  Simply using __builtin_memset in all cases should give better code, and is more likely to be forward compatible with future ISAs where the arbitrary constant picked in this patch may or may not be optimal.

David