Re: i386 on amd64 can fail to return from cond_wait_user, using basically 100% of a FreeBSD cpu

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sat, 05 Jul 2025 08:23:16 UTC
On Fri, Jul 04, 2025 at 11:01:22PM -0700, Mark Millard wrote:
> Some package builds are failing on the port-packages build cluster
> machines that do i386 builds during the following code. The analysis
> is from replication in a personal context (using poudriere bulk with
> -i), using poudriere-devel instead. I'll note that the personal
> context is from using PkgBase 14.3-RELEASE in the poudriere jail.
> I also installed most of the realted *-dbg* PkgBase packages in order
> to get the nicer backtracing.
> 
> (gdb) bt
> #0  _umtx_op_err () at /home/pkgbuild/worktrees/releng/14.3/lib/libthr/arch/i386/i386/_umtx_op_err.S:37
> #1  0x2499f897 in _thr_umtx_timedwait_uint (mtx=0x249a365c, id=0, clockid=4, abstime=0x0, shared=0) at /home/pkgbuild/worktrees/releng/14.3/lib/libthr/thread/thr_umtx.c:233
> #2  0x24995b26 in _thr_sleep (curthread=0x24d36004, clockid=4, abstime=0x0) at /home/pkgbuild/worktrees/releng/14.3/lib/libthr/thread/thr_kern.c:197
> #3  0x24990beb in cond_wait_user (cvp=0x24dfa8a0, mp=0x24d38d04, abstime=<optimized out>, cancel=<optimized out>) at /home/pkgbuild/worktrees/releng/14.3/lib/libthr/thread/thr_cond.c:317
> 
> NOTE: cond_wait_user never returns but #2..#0 repeat (observed by
> repeated ^c and bt usage).
> 
> (i386 is the oddball with 32-bit time_t but I do not know
> if that is involved here.)
> 
> #4  cond_wait_common (cond=<optimized out>, mutex=<optimized out>, abstime=0x0, cancel=1) at /home/pkgbuild/worktrees/releng/14.3/lib/libthr/thread/thr_cond.c:377
> #5  0x24990e8f in __thr_cond_wait (cond=0x23b9b4f4, mutex=0x23b9b4ec) at /home/pkgbuild/worktrees/releng/14.3/lib/libthr/thread/thr_cond.c:392
> #6  0x23be1e4b in uv_cond_wait () from /usr/local/lib/libuv.so.1
> #7  0x024bd497 in node::NodePlatform::DrainTasks(v8::Isolate*) ()
> #8  0x0232f5b6 in node::SpinEventLoopInternal(node::Environment*) ()
> #9  0x02485bf0 in node::NodeMainInstance::Run() ()
> #10 0x023eaba1 in node::Start(int, char**) ()
> #11 0x24a1da85 in __libc_start1 (argc=5, argv=0xffffda3c, env=0xffffda54, cleanup=0x23b73020 <rtld_nop_exit>, mainX=0x314a720 <main>)
>     at /home/pkgbuild/worktrees/releng/14.3/lib/libc/csu/libc_start1.c:157
> #12 0x0232d0a8 in _start ()
> 
> www/librewolf and other firefox related package builds can do
> this until a 7200 sec timeout by poudriere occurs:
> 
> =>> Killing runaway build after 7200 seconds with no output
> 
> I'll note that truss did not generate any output when used to
> watch the process that was stuck. It appears to be a world-internal
> problem.

Can you  provide a minimal stand-alone reproducer in C?