From nobody Tue Apr 23 12:47:03 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VP22G436bz5HB05 for ; Tue, 23 Apr 2024 12:47:18 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-vs1-f49.google.com (mail-vs1-f49.google.com [209.85.217.49]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VP22G0M5Rz4dxS for ; Tue, 23 Apr 2024 12:47:18 +0000 (UTC) (envelope-from asomers@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-vs1-f49.google.com with SMTP id ada2fe7eead31-47a21267aa8so1842632137.3 for ; Tue, 23 Apr 2024 05:47:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713876435; x=1714481235; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P+DbBTyPji2ua9XACFXO+uUyhQjQuv4z7zR49zUFxxY=; b=e+eNcyI1qzImO56r5LXHSoneO8DyTjp0RSkb2jOJvUKytAlzHF1v2YBzDRUKCjpWdR Y+jCSMx1CuuW4bSpoUk4YsONMJ/zYrTRJHawzGhFbPiv0UKjYX5LlLR2B4D0xEGZdDdN PUHACCEKOHb1wSUfr7MoA1OYAae6k7kW1RNBcoodYzGIgCMP1JcNdo9fAuCvdyH1yLxl SCH+SfWcs7r9mKNFlpBrYrgLKxBjnVuSmB5au9pDWeXDXmA5j0QszXiexviwy6BswkKE L9gQnfmDvboE9ILTytPFiMQFZhJ+iUWfLEqcUuCJdSM5cbSfmTcRhw9owvDnXQ8NQTmM VPfA== X-Forwarded-Encrypted: i=1; AJvYcCWZQS6wZdN1iCOeie95cq9MKneRnIg832l2S1yX5PkCXCYMaZXBwCax90S8TUlYzGJFH6t62U8egV1jpOYyp5Il3abVSuTorxPlpy0= X-Gm-Message-State: AOJu0Yw0A3JU1K8WtvPmYXX6JP4TTwEJ1N2MIFGG68krkXI17bgV4Qd9 wbPUTpNeUJw2hn0cU02rU1L4u6HI+xEGFLX5uPwb2xiqjmbB4VRfvKXfLnA0+DLhbLAGKJkXjkU pjWgg08pZz2f4VHDrzfWsyzG2LYwltQ== X-Google-Smtp-Source: AGHT+IHQWpwJZ+SRBPnsGgLATUjh8YNrYNIPWnPmc2VRYN2w1NevfT545i/ZsezSN65u+AGbIB9X4NUHh3sGCW6XsXk= X-Received: by 2002:a67:ce03:0:b0:47b:a44a:bad6 with SMTP id s3-20020a67ce03000000b0047ba44abad6mr13742435vsl.25.1713876434957; Tue, 23 Apr 2024 05:47:14 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 References: <2b72c4f749e93dfec08a164d5a664ee3@Leidinger.net> In-Reply-To: <2b72c4f749e93dfec08a164d5a664ee3@Leidinger.net> From: Alan Somers Date: Tue, 23 Apr 2024 06:47:03 -0600 Message-ID: Subject: Re: Stressing malloc(9) To: Alexander Leidinger Cc: Karl Denninger , freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US] X-Rspamd-Queue-Id: 4VP22G0M5Rz4dxS On Tue, Apr 23, 2024 at 2:37=E2=80=AFAM Alexander Leidinger wrote: > > Am 2024-04-23 00:05, schrieb Alan Somers: > > On Mon, Apr 22, 2024 at 2:07=E2=80=AFPM Karl Denninger > > wrote: > >> > >> On 4/22/2024 12:46, Alan Somers wrote: > >> > >> When I said "33kiB" I meant "33 pages", or 132 kB. And the solution > >> turns out to be very easy. Since I'm using ZFS on top of geli, with > >> the default recsize of 128kB, I'll just set > >> vfs.zfs.vdev.aggregation_limit to 128 kB. That way geli will never > >> need to allocate more than 128kB contiguously. ZFS doesn't even need > >> those big allocations to be contiguous; it's just aggregating smaller > >> operations to reduce disk IOPs. But aggregating up to 1MB (the > >> default) is overkill; any rotating HDD should easily be able to max > >> out its consecutive write IOPs with 128kB operation size. I'll add a > >> read-only sysctl for g_eli_alloc_sz too. Thanks Mark. > >> > >> -Alan > >> > >> Setting this on one of my production machines that uses zfs behind > >> geli drops the load average quite materially with zero impact on > >> throughput that I can see (thus far.) I will run this for a while but > >> it certainly doesn't appear to have any negatives associated with it > >> and does appear to improve efficiency quite a bit. > > > > Great news! Also, FTR I should add that this advice only applies to > > people who use HDDs. For SSDs zfs uses a different aggregation limit, > > and the default value is already low enough. > > You basically say, that it is not uncommon to have such large > allocations with kernels we ship (even in releases). > Wouldn't it make sense to optimize the kernel to handle larger uma > allocations? > > Or do you expect it to be specific to ZFS and it may be more sane to > discuss with the OpenZFS developers to reduce this default setting? Yes, both of those things are true. It might make sense to reduce the setting's default value. OTOH, the current value is probably fine for people who don't use geli (and possibly other transforms that require allocating data). And it would also be good to optimize the kernel to perform these allocations more efficiently. My best idea is to teach g_eli_alloc_data how to allocate scatter/gather lists of 64k buffers instead of contiguous memory. The memory doesn't need to be contiguous, after all. But that's a bigger change, and I don't know that I have the time for it right now. -Alan