[Bug 284743] System reproducably lifelocks after a couple of hours in poudriere bulk -a

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 11 Feb 2025 19:04:45 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=284743

            Bug ID: 284743
           Summary: System reproducably lifelocks after a couple of hours
                    in poudriere bulk -a
           Product: Base System
           Version: 15.0-CURRENT
          Hardware: riscv
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: riscv
          Assignee: riscv@FreeBSD.org
          Reporter: fuz@FreeBSD.org

While trying to do an exp-run on the FreeBSD ports tree on riscv64, I found
that the machine would hang after a few hours of crunching on the ports.  I
initially thought this was HW related, but was able to reproduce the hang on
two emulators (QEMU and RVVM).  Conditions in which this happens:

 - poudriere-bulk -a
 - on a system with root on ZFS
 - 4 or 8 cores
 - 16 GB of RAM
 - on SiFive Unmatched, QEMU, or RVVM (see port emulators/rvvm)
 - USE_TMPFS is set to "data localbase"
 - sufficient disk space is available

Other conditions may apply.  This may reproduce with root on UFS, though I have
not tried.

The hang seems to be a lifelock: in the emulators, all cores seem to be peaked
at 100% load.  Machine doesn't ping anymore and doesn't respond to the serial
console.

The hang doesn't happen immediately but rather after a few hours to days of
building.  Once it has happened once, it seems to happen within a few (~6)
hours when I restart the build.  It is possible that certain ports trigger the
conditions needed for the hang.

To reproduce, set up a riscv64 machine with 15-CURRENT root on ZFS, get a ports
tree, install Poudriere, build a jail, and do "poudriere bulk -a".  Then wait
until the machine hangs.

-- 
You are receiving this mail because:
You are the assignee for the bug.