Re: Very long-running buildworld process

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Thu, 12 Jun 2025 18:51:35 UTC
On Thu, Jun 12, 2025 at 10:33:24AM -0700, Mark Millard wrote:
> On Jun 12, 2025, at 09:50, bob prohaska <fbsd@www.zefox.net> wrote:
> 
> > On Wed, Jun 11, 2025 at 09:31:54PM -0700, Mark Millard wrote:
> >> Hmm. May be:
> >> 
> >> # ps -axldww
> >> 
> >> might show something that would prove interesting?
> > I've placed the output at
> > http://www.zefox.net/~fbsd/rpi2/20250612/ps-axldww.log
> > It's too wide to view on the list. It shows the complete
> > (very long) command line for the offending PID.
> > 
> >> 
> >> As for watching the specific c++ process, may be
> >> you might temporarily run (output goes to stderr):
> >> 
> >> # truss -fae -p 21654
> >> 
> >> then ^C it. From this you would learn if the kernel is
> >> in use via system calls and what kinds of system calls.
> >> 
> >> (I picked truss for simpliicty vs. ktrace and kdump use.)
> >> 
> > 
> > Near as I can tell, truss produces zero output after running
> > for a couple of minutes.
> 
> That indicates internal looping with no IO or other
> kernel-based activity.
> 
> In the ps output, it is the only process showing more
> than 15000 (574536) for VSZ. Thus it seems likely to be
> the major source for the Inact and Used SWAP showing in
> your top runs.
> 
> I wonder if your c++ has enough symbol information (or
> possibly debug information) to make attachment to the
> process with gdb or lldb and a backtrace useful (routine
> names, for example, not just addresses)? (An attached
> debuger can also quit/exit the program being debugged.)
>
The machine is a vanilla -current install, with whatever
comes "out of the box". There's a gdb man page and binary,
but neither for lldb.
 
> Other than that sort of information gathering, it looks
> like the process is stuck looping strictly internal to
> itself, not asking for kernel services. Ultimately,
> killing or quitting it some way.
> 
It's possible to get into (a) debugger using ~^B, so:
bob@www:~ % KDB: enter: Break to debugger
[ thread pid 10 tid 100002 ]
Stopped at      kdb_alt_break_internal+0x1b8:   ldrb    r15, [r15, r15, ror r15]!
db> 
but what to do next is unclear. I know how to type bt and
copy the console output, but that's all at this stage. 


> If you could figure the right directory to execute the
> long c++ command from, you might be able to retry the
> command from the same file context to see if it again
> gets stuck. If it does, then there might be some hope
> of tracking down what is going on if the context is
> preserved. (That is not the same as saying doing so
> would be reasonable to try.)
>

ISTR it's possible to place a job under the "supervision"
of a debugger and then poke around in the process at will.
The details are unclear, might that be instructive in this 
case? I can always pull the plug and then restart buildworld.
Or, maybe just kill the originating job, though I'm not sure 
that will kill all the child processes. 

If history is any guide starting the buildworld over will
lead to the same loop, though of course one can't be sure
without trying 8-)

Thanks for writing, any suggestions appreciated!

bob prohaska