Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 05 Sep 2023 05:06:45 UTC
On Sep 4, 2023, at 18:39, Mark Millard <marklmi@yahoo.com> wrote:

> On Sep 4, 2023, at 10:05, Alexander Motin <mav@FreeBSD.org> wrote:
> 
>> On 04.09.2023 11:45, Mark Millard wrote:
>>> On Sep 4, 2023, at 06:09, Alexander Motin <mav@FreeBSD.org> wrote:
>>>> per_txg_dirty_frees_percent is directly related to the delete delays we see here.  You are forcing ZFS to commit transactions each 5% of dirty ARC limit, which is 5% of 10% or memory size.  I haven't looked on that code recently, but I guess setting it too low can make ZFS commit transactions too often, increasing write inflation for the underlying storage.  I would propose you to restore the default and try again.
>>> While this machine is different, the original problem was worse than
>>> the issue here: the load average was less than 1 for the most part
>>> the parallel bulk build when 30 was used. The fraction of time waiting
>>> was much longer than with 5. If I understand right, both too high and
>>> too low for a type of context can lead to increased elapsed time and
>>> getting it set to a near optimal is a non-obvious exploration.
>> 
>> IIRC this limit was modified several times since originally implemented.  May be it could benefit from another look, if default 30% is not good.  It would be good if generic ZFS issues like this were reported to OpenZFS upstream to be visible to a wider public.  Unfortunately I have several other project I must work on, so if it is not a regression I can't promise I'll take it right now, so anybody else is welcome.
> 
> As I understand, there are contexts were 5 is inappropriate
> and 30 works fairly well: no good single answer as to what
> value range will avoid problems.
> 
>>> An overall point for the goal of my activity is: what makes a
>>> good test context for checking if ZFS is again safe to use?
>>> May be other tradeoffs make, say, 4 hardware threads more
>>> reasonable than 32.
>> 
>> Thank you for your testing.  The best test is one that nobody else run. It also correlates with the topic of "safe to use", which also depends on what it is used for. :)
> 
> Looks like use of a M.2 Samsung SSD 960 PRO 1TB with a
> non-debug FreeBSD build is suitable for the bulk -a -J128
> test (no ALLOW_MAKE_JOBS variants enabled, USE_TMPFS=no in
> use) on the 32 hardware thread system. (The swap partition
> in use is from the normal environment's PCIe Optane media.)
> The %idle and the load averages and %user stayed reasonable
> in a preliminary test. One thing it does introduce is trim
> management (both available and potentially useful). (Optane
> media does not support or need it.) No
> per_txg_dirty_frees_percent adjustment involved (still 5).
> 
> I've learned to not use ^T for fear of /bin/sh aborting
> and messing up poudriere's context. So I now monitor with:
> 
> # poudriere status -b
> 
> in a separate ssh session.
> 
> I'll note that I doubt I'd try for a complete bulk -a .
> I'd probably stop it if I notice that the number of
> active builders drops off for a notable time (normal
> waiting for prerequisites appearing to be why).
> 
> 

So much for that idea. It has reached a state of staying
under 3500 w/s and up to 4.5ms/w (normally above 2ms/w).
%busy wondering in the range 85% to 101%. Lots of top
showing tx->tx. There is some read and other activity as
well. Of course the kBps figures are larger than the
earlier USB3 context (larger kB figures).

It reached about 1350 port->package builds over the first
hour after the 2nd "Buildee started".

autotrim is off. Doing a "zpool trim -w zamd64" leads to
. . . larger w/s figures during the process. So
more exploring to do at some point. Possibly:

autotrim
per_txg_dirty_frees_percent

For now, I'm just running "zpool trim -w zamd64" once
and a while so the process continues better.

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

. . . At around the end of 2nd hour: 2920 or so built. 

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

. . . I've turned on autotrim without stopping the bulk
build.

. . . At around the end of 3rd hour: 4080 or so built. 

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

Looks like the % idle has been high for a significant
time. I think I'll stop this specific test and clean
things out.

Looks like lang/guile* are examples of not respecting
the lack of ALLOW_MAKE_JOBS use.

Hmm. The ^C ended up with:

^C[03:41:07] Error: Signal SIGINT caught, cleaning up and exiting
[main-amd64-bulk_a-default] [2023-09-04_17h49m28s] [sigint:] Queued: 34588 Built: 4331  Failed: 12    Skipped: 303   Ignored: 335   Fetched: 0     Tobuild: 29607  Time: 03:41:05
[03:41:07] Logs: /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-04_17h49m28s
[03:41:19] [95] [00:06:59] Finished databases/php80-pdo_sqlite | php80-pdo_sqlite-8.0.30: Failed: build-depends
[03:41:20] [60] [00:10:53] Finished www/php82-session | php82-session-8.2.9: Success
[03:41:20] [19] [00:06:42] Finished databases/pecl-memcache@php80 | php80-pecl-memcache-8.2: Failed: build-depends
[03:41:23] [62] [00:06:40] Finished net-mgmt/icingaweb2-module-incubator@php81 | icingaweb2-module-incubator-php81-0.20.0: Success
[03:41:24] Cleaning up
exit: cannot open ./var/run/116_nohang.pid: No such file or directory
exit: cannot open ./var/run/93_nohang.pid: No such file or directory

and JID 253's processes are stuck in the STOP state, along with the
related JID 0 processes. These are for x11-toolkits/libXaw . I do
not know how long it might have been stuck.

/usr/local/libexec/poudriere/sh -e /usr/local/share/poudriere/bulk.sh -J128 -jmain-amd64-bulk_a -a

is stuck in killpg state:

 3896 101784 sh                  -                   mi_switch+0xbb _sx_xlock_hard+0x3e1 killpg1+0x193 kern_kill+0x23c amd64_syscall+0x109 fast_syscall_common+0xf8 

Most recent process birth time listed-first order for the STOP's:

86826 634820 sh                  -                   mi_switch+0xbb thread_suspend_switch+0xc3 cursig+0x7b9 ast_sig+0x1df ast_handler+0x88 ast+0x20 doreti_ast+0x1c 
86825 321240 sh                  -                   mi_switch+0xbb thread_suspend_check+0x260 sig_intr+0x4c fork1+0x186 sys_fork+0x54 amd64_syscall+0x109 fast_syscall_common+0xf8 
86550 308525 sh                  -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 pipe_read+0x420 dofileread+0x86 sys_read+0xbe amd64_syscall+0x109 fast_syscall_common+0xf8 
 3998 637224 gmake               -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+0xf8 
 3976 298566 sh                  -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+0xf8 
 3974 309336 gmake               -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+0xf8 
 3962 577230 gmake               -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+0xf8 
 3958 635598 sh                  -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+
0xf8 
 3915 668148 make                -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+
0xf8 
 3912 638108 sh                  -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_wait_sig+0x9 _sleep+0x1e3 kern_wait6+0x51c sys_wait4+0x7d amd64_syscall+0x109 fast_syscall_common+
0xf
46142 212631 sh                  -                   mi_switch+0xbb sleepq_catch_signals+0x2ab sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x10c seltdwait+0x75 kern_select+0x95e sys_select+0x57 amd64_syscall+0x109 fast_syscall_common+0xf8 

===
Mark Millard
marklmi at yahoo.com