tmpfs use and "stress --hdd ? --hdd-bytes ??G" can lead to hung up system (16 Cortex-A72's system example)

From: Mark Millard via freebsd-stable <freebsd-stable_at_freebsd.org>
Date: Sun, 21 Nov 2021 22:13:10 UTC
Context:

# uname -apKU
FreeBSD CA72_16Gp_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #13 stable/13-n248062-109330155000-dirty: Sat Nov 13 23:55:14 PST 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1300520 1300520

(It is a non-debug kernel build, but with symbols.)

64 GiBytes of RAM
Swap: 251904Mi Total
16-core Cortex-A72 system (HoneyComb)
root-on-ZFS on Optane media in the PCIe slot

The following sort of sequence using tmpfs tends to hangs
up the system long before the maximum swap usage would
happen:

# mount -ttmpfs tmpfs /mnt
# cd /mnt
# stress --hdd 16 --hdd-bytes 16G

The details of processes seeming to hang up
does vary. Sometimes ^C, ^T, and the like
do not seem to work (or echo) at all.

Top can stop updating for the likes of:

# stress --hdd 12 --hdd-bytes 22G

until ^C leads to stress exiting. Based on what
top was reporting at the point its updates stopped,
there still was lots of swap-space and no apparent
need to have swapped out the kernel stacks for
the top process. For example: 22% used.

Another issue is that having systat -swap going
as well leads to partial hang ups even for the
likes of:

# stress --hdd 8 --hdd-bytes 32G

"Partial" in that some things stop but I've never
had to force a reboot to get control: ^C for the
stress run eventually works.

I do not seem to have problems like any of the
above for the likes of:

# stress --hdd 1 --hdd-bytes 270G

For this, "systat -swap" does progress the like of:

Device/Path       Size  Used |0%  /10  /20  /30  /40  / 60\  70\  80\  90\ 100|
gpt/CA72opt0SWP   246G  154G XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

But not with the larger ---hdd figures above.

I've not yet tried main [so: 14].


Note:

These experiments are from attmpting to reproduce
hangups seen during pourdiere-devel builds that have
USE_TMPFS=all and builders that are generating
indefinitely growing log files. (This was actually
under main targeting main .)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)