Re: CFT: snmalloc as libc malloc

From: David Chisnall <theraven_at_FreeBSD.org>
Date: Mon, 13 Feb 2023 12:51:30 UTC
On 09/02/2023 23:09, Konstantin Belousov wrote:
> On Thu, Feb 09, 2023 at 09:53:34PM +0100, Mateusz Guzik wrote:
>> So, as someone who worked on memcpy previously, I note the variant
>> currently implemented in libc is pessimal for sizes > 16 bytes because
>> it does not use SIMD. I do have plans to rectify that, but ENOTIME.
> 
> Note that you need two kinds of micro-benchmarks for this:
> - normal microbenchmark which does the SIMD-enabled memcpy() in a loop
> - a microbenchmark which ensures that the SIMD register file ownership
>    is re-taken on each iteration (or close to it).
> 
> I am sure that the results from #2 would be astonishing and give quite
> different prospective on the use of SIMD for basic libc services.

Does FreeBSD still do lazy context switching of SIMD state?  I was under 
the impression that this was disabled by all operating systems now 
because it exposes speculative side channels across a process boundary.

Given that the x86-64 and AArch64 ABIs both pass floating point 
arguments in SIMD registers by default, I'd be surprised if it gave a 
performance win - unless a workload manages to avoid passing any 
floating-point arguments in a quantum, it will hit the trap every time. 
In addition, unless you explicitly disable it, recent versions of clang 
will use SIMD registers for inlined memcpy (irrespective of what libc 
does) and will also now spill GPRs to SIMD registers in preference to 
the stack in some situations.

David