Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Mon, 04 Sep 2023 17:05:00 UTC
On 04.09.2023 11:45, Mark Millard wrote:
> On Sep 4, 2023, at 06:09, Alexander Motin <mav@FreeBSD.org> wrote:
>> per_txg_dirty_frees_percent is directly related to the delete delays we see here.  You are forcing ZFS to commit transactions each 5% of dirty ARC limit, which is 5% of 10% or memory size.  I haven't looked on that code recently, but I guess setting it too low can make ZFS commit transactions too often, increasing write inflation for the underlying storage.  I would propose you to restore the default and try again.
> 
> While this machine is different, the original problem was worse than
> the issue here: the load average was less than 1 for the most part
> the parallel bulk build when 30 was used. The fraction of time waiting
> was much longer than with 5. If I understand right, both too high and
> too low for a type of context can lead to increased elapsed time and
> getting it set to a near optimal is a non-obvious exploration.

IIRC this limit was modified several times since originally implemented. 
  May be it could benefit from another look, if default 30% is not good. 
  It would be good if generic ZFS issues like this were reported to 
OpenZFS upstream to be visible to a wider public.  Unfortunately I have 
several other project I must work on, so if it is not a regression I 
can't promise I'll take it right now, so anybody else is welcome.

> An overall point for the goal of my activity is: what makes a
> good test context for checking if ZFS is again safe to use?
> May be other tradeoffs make, say, 4 hardware threads more
> reasonable than 32.

Thank you for your testing.  The best test is one that nobody else run. 
It also correlates with the topic of "safe to use", which also depends 
on what it is used for. :)

-- 
Alexander Motin