From nobody Sun Apr 21 16:09:05 2024
X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VMtc73lRHz5HFGK
	for <freebsd-hackers@mlmmj.nyi.freebsd.org>; Sun, 21 Apr 2024 16:09:11 +0000 (UTC)
	(envelope-from markjdb@gmail.com)
Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4VMtc65mJ8z4LWt;
	Sun, 21 Apr 2024 16:09:10 +0000 (UTC)
	(envelope-from markjdb@gmail.com)
Authentication-Results: mx1.freebsd.org;
	none
Received: by mail-qt1-x836.google.com with SMTP id d75a77b69052e-436f1a770bdso33734001cf.0;
        Sun, 21 Apr 2024 09:09:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1713715749; x=1714320549; darn=freebsd.org;
        h=in-reply-to:content-transfer-encoding:content-disposition
         :mime-version:references:message-id:subject:cc:to:from:date:sender
         :from:to:cc:subject:date:message-id:reply-to;
        bh=FIUdFWwBbqsRD3j7L8HgBvuUXbS8uTtsYyJsOdepD3k=;
        b=L2/Hfzyb2JUmNqVcF8Kpe/Hl9dhBA+QWahd2cfkBxnWDw5PdCTo9TBY1qaHWKvC8lZ
         yi8ZpHIKJqosspMjtxny0v9kCmiS5n9KBVYMM3zlPFU9hOBBI0Yia9bHt2DzliUZs25N
         pS3+rPRElkVWbO+xXKnBspY20mI5vtLgpkvVZaveJbBuJ7BHgi856zu0cQ3Q5qOIFzVZ
         LVhVXS6e/bumpxG46bL42XS6RzFkmjp7E/SkoMGiZ8zy5mfWwtMLLqmaNNzeO+03KKbc
         A1CYRtsaEyWL21Fb2Wr/mcbNDlVVrLDyayBLJr4nKzbrStpIvADdEEDXBC5Kk8WZqxG1
         kEiQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1713715749; x=1714320549;
        h=in-reply-to:content-transfer-encoding:content-disposition
         :mime-version:references:message-id:subject:cc:to:from:date:sender
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=FIUdFWwBbqsRD3j7L8HgBvuUXbS8uTtsYyJsOdepD3k=;
        b=kIq0C2DQVqgcpP67odcsEsQ+cK4chJX0LZPdzDrLv97AWGu8EMgO+SV9CuPnNz1Tuw
         YzipZNJ7h1WuQbcS6dQxWAUZLvecIPRbKbW5IoNV4oc8iLUMweXg8/usk5KXbdnqehpV
         ZZtJO9GnnpytPzolsy5EMl6Bhk7sWT59P7wd0BESqi9BBhSTY28ERNmpxz4zsmYQvP9x
         3HgZDIyHla9XxPIasDrhefZLmygJUWySL0zrw9vbc663NQrxKVy2OESqGDYVIBnYuWE6
         fg6P35F6oyD9EoMFukozCCi2RPmPlDu1k3MWJJEC78YMV8qX7Yfk5iWJZOhHAhwGKUOv
         s+9Q==
X-Gm-Message-State: AOJu0Yx+d2w3fmiWsyYLqEELfHcg85yuN6M5WXAT65/S9Vs5cGNSP7dq
	ckgeef0F31mnZ0N0pQfLc4PHHHEaIluX9c8ujWK1mu48IeYSsFcCN4Em0g==
X-Google-Smtp-Source: AGHT+IE9D3KQIDl3cglGnpf2In7TE7B8WVAQ7/8AL6fWrcbwAuHpXdyQtkITOeIzPiPv9CddjIxz1w==
X-Received: by 2002:a05:622a:5189:b0:439:7526:2bad with SMTP id ex9-20020a05622a518900b0043975262badmr8753784qtb.16.1713715748873;
        Sun, 21 Apr 2024 09:09:08 -0700 (PDT)
Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237])
        by smtp.gmail.com with ESMTPSA id v10-20020ac873ca000000b004378b8ef629sm3479637qtp.31.2024.04.21.09.09.07
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sun, 21 Apr 2024 09:09:07 -0700 (PDT)
Date: Sun, 21 Apr 2024 12:09:05 -0400
From: Mark Johnston <markj@freebsd.org>
To: Alan Somers <asomers@freebsd.org>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject: Re: Stressing malloc(9)
Message-ID: <ZiU6IZ29syVsg61p@nuc>
References: <CAOtMX2jeDHS15bGgzD89AOAd1SzS_=FikorkCdv9-eAxCZ2P5w@mail.gmail.com>
 <ZiPaFw0q17RGE7cS@nuc>
 <CAOtMX2jk6+SvqMP7Cbmdk0KQCFZ34yWuir7n_8ewZYJF2MwPSg@mail.gmail.com>
List-Id: Technical discussions relating to FreeBSD <freebsd-hackers.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-hackers
List-Help: <mailto:freebsd-hackers+help@freebsd.org>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Subscribe: <mailto:freebsd-hackers+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-hackers+unsubscribe@freebsd.org>
Sender: owner-freebsd-hackers@FreeBSD.org
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CAOtMX2jk6+SvqMP7Cbmdk0KQCFZ34yWuir7n_8ewZYJF2MwPSg@mail.gmail.com>
X-Spamd-Bar: ----
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]
X-Rspamd-Queue-Id: 4VMtc65mJ8z4LWt

On Sat, Apr 20, 2024 at 11:23:41AM -0600, Alan Somers wrote:
> On Sat, Apr 20, 2024 at 9:07 AM Mark Johnston <markj@freebsd.org> wrote:
> >
> > On Fri, Apr 19, 2024 at 04:23:51PM -0600, Alan Somers wrote:
> > > TLDR;
> > > How can I create a workload that causes malloc(9)'s performance to plummet?
> > >
> > > Background:
> > > I recently witnessed a performance problem on a production server.
> > > Overall throughput dropped by over 30x.  dtrace showed that 60% of the
> > > CPU time was dominated by lock_delay as called by three functions:
> > > printf (via ctl_worker_thread), g_eli_alloc_data, and
> > > g_eli_write_done.  One thing those three have in common is that they
> > > all use malloc(9).  Fixing the problem was as simple as telling CTL to
> > > stop printing so many warnings, by tuning
> > > kern.cam.ctl.time_io_secs=100000.
> > >
> > > But even with CTL quieted, dtrace still reports ~6% of the CPU cycles
> > > in lock_delay via g_eli_alloc_data.  So I believe that malloc is
> > > limiting geli's performance.  I would like to try replacing it with
> > > uma(9).
> >
> > What is the size of the allocations that g_eli_alloc_data() is doing?
> > malloc() is a pretty thin layer over UMA for allocations <= 64KB.
> > Larger allocations are handled by a different path (malloc_large())
> > which goes directly to the kmem_* allocator functions.  Those functions
> > are very expensive: they're serialized by global locks and need to
> > update the pmap (and perform TLB shootdowns when memory is freed).
> > They're not meant to be used at a high rate.
> 
> In my benchmarks so far, 512B.  In the real application the size is
> mostly between 4k and 16k, and it's always a multiple of 4k. But it's
> sometimes great enough to use malloc_large, and it's those
> malloc_large calls that account for the majority of the time spent in
> g_eli_alloc_data.  lockstat shows that malloc_large, as called by
> g_elI_alloc_data, sometimes blocks for multiple ms.
> 
> But oddly, if I change the parameters so that g_eli_alloc_data
> allocates 128kB, I still don't see malloc_large getting called.  And
> both dtrace and vmstat show that malloc is mostly operating on 512B
> allocations.  But dtrace does confirm that g_eli_alloc_data is being
> called with 128kB arguments.  Maybe something is getting inlined?

malloc_large() is annotated __noinline, for what it's worth.

> I
> don't understand how this is happening.  I could probably figure out
> if I recompile with some extra SDT probes, though.

What is g_eli_alloc_sz on your system?

> > My first guess would be that your production workload was hitting this
> > path, and your benchmarks are not.  If you have stack traces or lock
> > names from DTrace, that would help validate this theory, in which case
> > using UMA to cache buffers would be a reasonable solution.
> 
> Would that require creating an extra UMA zone for every possible geli
> allocation size above 64kB?

Something like that.  Or have a zone of maxphys-sized buffers (actually
I think it needs to be slightly larger than that?) and accept the
corresponding waste, given that these allocations are short-lived.  This
is basically what g_eli_alloc_data() already does.

> > > But on a non-production server, none of my benchmark workloads causes
> > > g_eli_alloc_data to break a sweat.  I can't get its CPU consumption to
> > > rise higher than 0.5%.  And that's using the smallest sector size and
> > > block size that I can.
> > >
> > > So my question is: does anybody have a program that can really stress
> > > malloc(9)?  I'd like to run it in parallel with my geli benchmarks to
> > > see how much it interferes.
> > >
> > > -Alan
> > >