Re: CFT: snmalloc as libc malloc

From: David Chisnall <theraven_at_FreeBSD.org>
Date: Thu, 09 Feb 2023 19:36:48 UTC
On 9 Feb 2023, at 19:15, Mateusz Guzik <mjguzik@gmail.com> wrote:
> 
> it fails to build for me:
> 
> /usr/src/lib/libc/stdlib/snmalloc/malloc.cc:35:10: fatal error:
> 'override/jemalloc_compat.cc' file not found
> #include "override/jemalloc_compat.cc"
>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1 error generated.
> --- malloc.o ---
> *** [malloc.o] Error code 1
> 
> make[4]: stopped in /usr/src/lib/libc
> /usr/src/lib/libc/stdlib/snmalloc/memcpy.cc:25:10: fatal error:
> 'global/memcpy.h' file not found
> #include <global/memcpy.h>
>         ^~~~~~~~~~~~~~~~~
> 1 error generated.
> --- memcpy.o ---
> *** [memcpy.o] Error code 1

This looks as if you haven’t got the submodule?  Is there anything in contrib/snmalloc?

> this is a fresh world, top of snmalloc2 branch:
> commit a5c83c69817d03943b8be982dd815c7e263d1a83
> Author: David Chisnall <theraven@FreeBSD.org>
> Date:   Fri Jan 21 15:13:09 2022 +0000
> 
>    Initial commit of snmalloc2 in libc.
> 
> anyway, I wanted to say I find the memcpy thing incredibly suspicious.
> I found one article in
> https://github.com/microsoft/snmalloc/blob/main/docs/security/GuardedMemcpy.md
> which benches it and that made it even more suspicious. How did the
> benched memcpy look like inside?

Perhaps you could share what you are suspicious about?  I don’t really know how to respond to something so vague.  The document you linked to has the benchmark that we used (though the graphs in it appear to be based on an older version of the memcpy).  The PR that added PowerPC tuning has some additional graphs of measurements.

If you compile the memcpy file, you can see the assembly.  The C++ provides a set of building blocks for producing efficient memcpy implementations.  The fastest on x86 is roughly:

 - A jump table of power for small sizes that do power-of-two-sized small copies (double-word, word, half-word, and byte) to perform the copy.
 - A vectorised copy for medium-sized copies using a loop of SSE copies.
 - rep movsb for large copies.

The compiler does some quite complex layout for the jump table.

David