Re: Stressing malloc(9)

From: Mark Johnston <markj_at_freebsd.org>
Date: Sat, 20 Apr 2024 15:07:03 UTC
On Fri, Apr 19, 2024 at 04:23:51PM -0600, Alan Somers wrote:
> TLDR;
> How can I create a workload that causes malloc(9)'s performance to plummet?
> 
> Background:
> I recently witnessed a performance problem on a production server.
> Overall throughput dropped by over 30x.  dtrace showed that 60% of the
> CPU time was dominated by lock_delay as called by three functions:
> printf (via ctl_worker_thread), g_eli_alloc_data, and
> g_eli_write_done.  One thing those three have in common is that they
> all use malloc(9).  Fixing the problem was as simple as telling CTL to
> stop printing so many warnings, by tuning
> kern.cam.ctl.time_io_secs=100000.
> 
> But even with CTL quieted, dtrace still reports ~6% of the CPU cycles
> in lock_delay via g_eli_alloc_data.  So I believe that malloc is
> limiting geli's performance.  I would like to try replacing it with
> uma(9).

What is the size of the allocations that g_eli_alloc_data() is doing?
malloc() is a pretty thin layer over UMA for allocations <= 64KB.
Larger allocations are handled by a different path (malloc_large())
which goes directly to the kmem_* allocator functions.  Those functions
are very expensive: they're serialized by global locks and need to
update the pmap (and perform TLB shootdowns when memory is freed).
They're not meant to be used at a high rate.

My first guess would be that your production workload was hitting this
path, and your benchmarks are not.  If you have stack traces or lock
names from DTrace, that would help validate this theory, in which case
using UMA to cache buffers would be a reasonable solution.

> But on a non-production server, none of my benchmark workloads causes
> g_eli_alloc_data to break a sweat.  I can't get its CPU consumption to
> rise higher than 0.5%.  And that's using the smallest sector size and
> block size that I can.
> 
> So my question is: does anybody have a program that can really stress
> malloc(9)?  I'd like to run it in parallel with my geli benchmarks to
> see how much it interferes.
> 
> -Alan
>