Re: ZFS deadlock in 14
Date: Sat, 19 Aug 2023 20:41:56 UTC
[I forgot to adjust USE_TMPFS for the purpose of the test. So I'll later be starting over.] On Aug 19, 2023, at 12:18, Mark Millard <marklmi@yahoo.com> wrote: > On Aug 19, 2023, at 11:40, Mark Millard <marklmi@yahoo.com> wrote: > >> We will see how long the following high load average bulk -a >> configuration survives a build attempt, using a non-debug kernel >> for this test. >> >> I've applied: >> >> # fetch -o- https://github.com/openzfs/zfs/pull/15107.patch | git -C /usr/main-src/ am --dir=sys/contrib/openzfs >> - 13 kB 900 kBps 00s >> Applying: Remove fastwrite mechanism. >> >> # fetch -o- https://github.com/openzfs/zfs/pull/15122.patch | git -C /usr/main-src/ am --dir=sys/contrib/openzfs >> - 45 kB 1488 kBps 00s >> Applying: ZIL: Second attempt to reduce scope of zl_issuer_lock. >> >> on a ThreadRipper 1950X (32 hardware threads) that is at >> main 6b405053c997: >> >> Thu, 10 Aug 2023 >> . . . >> • git: cd25b0f740f8 - main - zfs: cherry-pick fix from openzfs Martin Matuska >> • git: 28d2e3b5dedf - main - zfs: cherry-pick fix from openzfs Martin Matuska >> . . . >> • git: 6b405053c997 - main - OpenSSL: clean up botched merges in OpenSSL 3.0.9 import Jung-uk Kim >> >> So it is based on starting with the 2 cherry-pick's as >> well. >> >> The ThreadRipper 1950X boots from a bectl BE and >> that zfs media is all that is in use here. >> >> I've setting up to test starting a bulk -a using >> ALLOW_MAKE_JOBS=yes along with allowing 32 builders. >> This so 32*32 or so potentially for load average(s) >> at times. There is 128 GiBytes of RAM and: >> >> # swapinfo >> Device 1K-blocks Used Avail Capacity >> /dev/gpt/OptBswp480 503316480 0 503316480 0% >> >> I'm not so sure that such a high load average bulk -a >> is reasonable for a debug kernel build: unsure of >> resource usage for such and if everything could be >> tracked as needed. So I'm testing a non-debug build >> for now. >> >> I have built the kernels (nodbg and dbg), installed >> the nodbg kernel, rebooted, and started: >> >> # poudriere bulk -jmain-amd64-bulk_a -a >> . . . >> [00:01:22] Building 34042 packages using up to 32 builders >> . . . >> >> The ports tree is from back in mid-July. >> >> I have a patched up top that records and reports >> various MaxObs???? figures (Maximum Observed). It >> was recetnly reporting: >> >> . . .; load averages: 119.56, 106.79, 71.54 MaxObs: 184.08, 112.10, 71.54 >> 1459 threads: . . ., 273 MaxObsRunning >> . . . >> Mem: . . ., 61066Mi MaxObsActive, 10277Mi MaxObsWired, 71371Mi MaxObs(Act+Wir+Lndry) >> . . . >> Swap: . . ., 61094Mi MaxObs(Act+Lndry+SwapUsed), 71371Mi MaxObs(Act+Wir+Lndry+SwapUsed) > > Status report at about 1 hr in: > > [main-amd64-bulk_a-default] [2023-08-19_11h04m26s] [parallel_build:] Queued: 34435 Built: 1929 Failed: 9 Skipped: 2569 Ignored: 358 Fetched: 0 Tobuild: 29570 Time: 00:59:59 > > Not hung up yet. > > From about 10 minutes after that: > > . . . load averages: 205.56, 181.58, 153.68 MaxObs: 213.78, 182.26, 153.68 > 1704 threads: . . ., 311 MaxObsRunning > . . . > Mem: . . ., 100250Mi MaxObsActive, 16857Mi MaxObsWired, 124879Mi MaxObs(Act+Wir+Lndry) > . . . > Swap: . . . 5994Mi MaxObsUsed, 116589Mi MaxObs(Act+Lndry+SwapUsed), 127354Mi MaxObs(Act+Wir+Lndry+SwapUsed) Just relized that I'd forgotten to reconfigure the USE_TMPFS=all to be USE_TMPFS=no so what I've done so far is not a great test. I'll still probably let it reach 3hr and get the summary information before I stop it, adjust USE_TMPFS, and start over from scratch. === Mark Millard marklmi at yahoo.com