Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 05 Sep 2023 01:39:32 UTC

On Sep 4, 2023, at 10:05, Alexander Motin <mav@FreeBSD.org> wrote:

> On 04.09.2023 11:45, Mark Millard wrote:
>> On Sep 4, 2023, at 06:09, Alexander Motin <mav@FreeBSD.org> wrote:
>>> per_txg_dirty_frees_percent is directly related to the delete delays we see here.  You are forcing ZFS to commit transactions each 5% of dirty ARC limit, which is 5% of 10% or memory size.  I haven't looked on that code recently, but I guess setting it too low can make ZFS commit transactions too often, increasing write inflation for the underlying storage.  I would propose you to restore the default and try again.
>> While this machine is different, the original problem was worse than
>> the issue here: the load average was less than 1 for the most part
>> the parallel bulk build when 30 was used. The fraction of time waiting
>> was much longer than with 5. If I understand right, both too high and
>> too low for a type of context can lead to increased elapsed time and
>> getting it set to a near optimal is a non-obvious exploration.
> 
> IIRC this limit was modified several times since originally implemented.  May be it could benefit from another look, if default 30% is not good.  It would be good if generic ZFS issues like this were reported to OpenZFS upstream to be visible to a wider public.  Unfortunately I have several other project I must work on, so if it is not a regression I can't promise I'll take it right now, so anybody else is welcome.

As I understand, there are contexts were 5 is inappropriate
and 30 works fairly well: no good single answer as to what
value range will avoid problems.

>> An overall point for the goal of my activity is: what makes a
>> good test context for checking if ZFS is again safe to use?
>> May be other tradeoffs make, say, 4 hardware threads more
>> reasonable than 32.
> 
> Thank you for your testing.  The best test is one that nobody else run. It also correlates with the topic of "safe to use", which also depends on what it is used for. :)

Looks like use of a M.2 Samsung SSD 960 PRO 1TB with a
non-debug FreeBSD build is suitable for the bulk -a -J128
test (no ALLOW_MAKE_JOBS variants enabled, USE_TMPFS=no in
use) on the 32 hardware thread system. (The swap partition
in use is from the normal environment's PCIe Optane media.)
The %idle and the load averages and %user stayed reasonable
in a preliminary test. One thing it does introduce is trim
management (both available and potentially useful). (Optane
media does not support or need it.) No
per_txg_dirty_frees_percent adjustment involved (still 5).

I've learned to not use ^T for fear of /bin/sh aborting
and messing up poudriere's context. So I now monitor with:

# poudriere status -b

in a separate ssh session.

I'll note that I doubt I'd try for a complete bulk -a .
I'd probably stop it if I notice that the number of
active builders drops off for a notable time (normal
waiting for prerequisites appearing to be why).


===
Mark Millard
marklmi at yahoo.com