svn commit: r253636 - head/sys/vm

Thu Jul 25 16:37:29 UTC 2013

On Thu, 25 Jul 2013, David Chisnall wrote:

> On 25 Jul 2013, at 09:11, Hans Petter Selasky <hps at bitfrost.no> wrote:
>
>> The structure looks like some size, so bzero() might run faster than memset() depending on the compiler settings. Should be profiled before changed!
>
> They will generate identical code for small structures with known sizes.  Both clang and gcc have a simplify libcalls pass that recognises both functions and will elide the call in preference to a small set of inline stores.

In the kernel, compilers are prevented from inlining memset() and many
other things by -ffreestanding in CFLAGS.  This rarely matters, and no
one except me cares.  In my version, memset() doesn't exist, but
bzero() has the micro-optimization of turning itself into __builtin_memset()
if the size is small (<= 32).  My memcpy() has the micro-optimization
of turning itself into __builtin_memcpy() unconditionally.  __builtin_memcpy()
then turns itself back into extern memcpy() according to the inverse of
a similar size check.  I think extern memcmp() still doesn't exist in
the FreeBSD kernel, so simply using __builtin_memset() would give linkage
errors when __builtin_memcmp() turns itself back into memcmp().

Determining whether the size is "small" is difficult.  It is very
CPU-dependent, and also depends on how efficient the extern function
is.  Compilers once used a very large limits for inlining, but changed
to fairly small limits when they realized that they didn't understand
memory.  Extern functions are hard to optimize, since the correct
optimization depends on the CPU including its cache organization.
FreeBSD's x86 bcopy() and bzero() are still optimized for generic
CPUs.  Generic means approximately the original i386, but since these
are important operations, all CPUs run the old i386 code for them not
to badly (perhaps only twice as slow as possible), with newer Intel
systems doing it better than most.

Use of memcpy() in the kernel is the result of previous
micro-optimizations.  It was supposed to be used only for small
fixed-size copies.  This could have been done better by making bcopy()
inline and calling __builtin_memcpy() in this case.  The extern
memcpy() should never have been used, but was needed for cases where
__builtin_memcpy() turns itself into memcpy(), which happened mainly
when compiling with -O0.  Other uses of memcpy() were style bugs.  No
one cared when this optimization was turned into a style bug in all
cases by -ffreestanding.

> However(), memset is to be preferred in this idiom because the compiler provides better diagnostics in the case of error:
>
> bzero.c:9:22: warning: 'memset' call operates on objects of type 'struct foo'
>      while the size is based on a different type 'struct foo *'
>      [-Wsizeof-pointer-memaccess]
>        memset(f, 0, sizeof(f));
>               ~            ^
> bzero.c:9:22: note: did you mean to dereference the argument to 'sizeof' (and
>      multiply it by the number of elements)?
>        memset(f, 0, sizeof(f));
>                            ^
>
> The same line with bzero(f, sizeof(f)) generates no error.

This is compiler bug with -ffreestanding.  Then memset() is not special.
clang allows me to declare my own memset but still prints this warning
if the API is not too different: no warning for "void memset(int *, int)",
but warning for "int memset(int *, int, int)".  The warning seems to
be based on the function name, since it is not claimed that the function
is standard (even without -ffreestanding).

While testing this, I mistyped an &f as &foo, where &foo is a function
name.  clang doesn't warn about this.  clang warned when memset was
my home made "int memset(int *, int, int)", because the function pointer
isn't compatible with int *.  But it also isn't compatible with void *.
I think casting NULL to a function pointer must work even if NULL is
spelled with a void *, but that is the only case where a void * object
pointer can safely be converted to a function pointer.

Bruce