svn commit: r286715 - head/lib/libc/string

Thu Aug 13 09:45:05 UTC 2015

On Thu, 13 Aug 2015, David Chisnall wrote:

> On 13 Aug 2015, at 08:11, Marcelo Araujo <araujobsdport at gmail.com> wrote:
>>
>> The bcopy() was removed in IEEE Std 1003.1-2008 and it is marked as LEGACY in IEEE Std 1003.1-2004. However, BSD has its implementation before IEEE Std 1003.1-2001.
>>
>> In my understood it is obsolete on POSIX, but not truly obsolete for FreeBSD.
>> So I believe, this patch now address it in the correct way.
>
> Its use should be strongly discouraged in FreeBSD (or, ideally, replaced with the macro from the POSIX man page).  LLVM does a load of optimisations for memmove and memcpy - using bcopy is a really good way of bypassing all of these.

That is a good reason to use bcopy.  Compilers are clueless about caches,
so their optimizations tend to be negative.  clang has a nice one now, not
related to caches.  It uses SSE if possible for small fixed-size memcpy's,
so takes a few hundred cycles longer for the first memcpy per context
switch if SSE would not have otherwise be used.  Compilers cannot know
if this is a good optimization.

If this were important, then someone except me would have noticed that
switching to -ffreestanding in the kernel turned off all builtins
including the one for memcpy().  But it really is unimportant.  Most
copying in the kernel is done by pagecopy(), and the possible gains
from optimizing that are on the order of 1% except in micro-benchmarks
that arrange to do excessive pagecopy()s.  The possible gains from
optimizations in the memcpy() builtin are much smaller.  The possible
gains from restoring all builtins may be as much as 1%.

I recently noticed bcmp() showing up in benchmarks.  Only 1% for
makeworld, but that is a lot for a single function.  clang doesn't
bother optimizing memcmp() and calls the library.  gcc-4.2.1 pessimizes
memcmp using "rep cmpsb" on x86 (gcc used to use a similar pessimization
for memcpy(), but learned better.  Strangely, "rep movsb" is now the
best method on Haswell except for small counts where gcc used to use it).
The library and kernel memcmp()s are not as bad as gcc's, but they
still use "rep movs[lq]" and this is still slow on Haswell.  Simple
C code is more than 6 times faster in micro-benchmarks on Haswell.
The FreeBSD library MI C code is a little too simple.  It uses bytewise
compares, so it it beats "rep cmpsb" but not "rep cmpsl".

Bruce