Re: CFT: snmalloc as libc malloc

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Mon, 13 Feb 2023 15:24:39 UTC
On Mon, Feb 13, 2023 at 12:51:30PM +0000, David Chisnall wrote:
> On 09/02/2023 23:09, Konstantin Belousov wrote:
> > On Thu, Feb 09, 2023 at 09:53:34PM +0100, Mateusz Guzik wrote:
> > > So, as someone who worked on memcpy previously, I note the variant
> > > currently implemented in libc is pessimal for sizes > 16 bytes because
> > > it does not use SIMD. I do have plans to rectify that, but ENOTIME.
> > 
> > Note that you need two kinds of micro-benchmarks for this:
> > - normal microbenchmark which does the SIMD-enabled memcpy() in a loop
> > - a microbenchmark which ensures that the SIMD register file ownership
> >    is re-taken on each iteration (or close to it).
> > 
> > I am sure that the results from #2 would be astonishing and give quite
> > different prospective on the use of SIMD for basic libc services.
> 
> Does FreeBSD still do lazy context switching of SIMD state?  I was under the
> impression that this was disabled by all operating systems now because it
> exposes speculative side channels across a process boundary.
FreeBSD uses lazy context instantiations.  Even with it, there were several
reports of problems caused by the use of XMM registers in libthr due to
significantly increased latency of the context switches.

> 
> Given that the x86-64 and AArch64 ABIs both pass floating point arguments in
> SIMD registers by default, I'd be surprised if it gave a performance win -
> unless a workload manages to avoid passing any floating-point arguments in a
> quantum, it will hit the trap every time. In addition, unless you explicitly
> disable it, recent versions of clang will use SIMD registers for inlined
> memcpy (irrespective of what libc does) and will also now spill GPRs to SIMD
> registers in preference to the stack in some situations.
It is common/not unusual to have programs that do not need floating point
arithmetic.