Re: Arm v7 RPi2 -current unresponsive to debugger escape during buildworld
Date: Fri, 07 Nov 2025 04:15:49 UTC
On Nov 6, 2025, at 18:22, bob prohaska <fbsd@www.zefox.net> wrote: > On Thu, Nov 06, 2025 at 10:00:19AM -0800, Mark Millard wrote: >> On Nov 6, 2025, at 08:38, bob prohaska <fbsd@www.zefox.net> wrote: >> >>> On Thu, Nov 06, 2025 at 03:45:01PM +0100, Ronald Klop wrote: >>>> Hi, >>>> >>>> To me it sounds like your machine is overwhelmed by swapping. >>>> >>>> Try -j1 buildworld. > Maybe a -j1 buildworld could be at least somewhat informative. > Lately none of my Pi2's has made it through buildworld > without hanging silently. If -j1 buildworld completes, > that would be a significant change. The test will take a > week, but the problem has been going on for a year. > >>> >>> In most cases of stoppage the swap use is low, 50 MB or sometimes less. >>> Up to about 6-700MB the machines slow their progress, but keep going and >>> there are no complaints on the console about swap taking too long or >>> insufficient. If there's a connection to swap use, it isn't obvious. >>> >>> It seems to be related more to hours of runtime than swap use. >>> >>> More to the point of my question, if the machine is swap-bound, >>> shouldn't the debugger escape still work? >> >> >> Are your descriptions of the lack of gaining control for use >> of the serial console? Do you also have ssh or such? Do all >> such see hangs as hung-up/crashed? > All comms become unresponsive, serial console or ssh. > >> Do you get notices about >> loss of network connections to the RPi2 v1.1 in question? > Sometimes, but not always. Occasionally an ssh session will > become unresponsive and only later report a disconnection. > >> Do any of those happen automatically? If so, the time >> of such a message could put a bound on when the RPi2 v1.1 >> hang-up/deadlocked/crashed, the message about failing >> communication having occurred after the problem starts on >> the RPi2 v1.1. > In some cases the stuck ssh sessions are disconnected only > after reboot completes. In others, it appears to be a matter > of time. Overnight is usually sufficient. > >> I'll note that your prior reporting of the end-of-log >> content gives evidence of things that completed, including >> being flushed to the disk. But there likely was more that >> was not flushed to the disk, some of which may have >> otherwise completed. Also, what was actually active at the >> time of the potential deadlock (or other form of crash) is >> unlikely to show in the logs with such a known status. >> > In a lot of cases there's been a top session with a timestamp > and swap usage running at the time of the crash. I've not > made careful comparisions. That's the only timestamping at hand. > >> The I/O tries to keep the file system media content from being >> corrupted, but not necessarily that it is up to date. (Fully >> attempting both leads to either a contradiction or horrible >> performance. UFS has different tradeoffs than ZFS for such >> issues but the same general goal applies to both. At least >> that is how I'd summarize it.) >> >> Knowing where the logs stop can give some idea what might >> follow or have been active, but it involves other analysis. >> >> I do not know if tail -f reports buffered information vs. >> only data that makes it to media. It might be that tail -f >> in an ssh session on the/a log file might report closer >> to the failure time, showing information that does not >> make it to the media. That need not be the same as showing >> the actual failure time: just possibly closer. >> >> >> As for debugger use, there are thousands of processes. >> If you mean gdb or lldb, there is no uniquely relevant >> process to attach to and monitor that survives across all >> the activity. > > Would running the buildworld command under a debugger's control > give any better access to the enter-tilda-control-B sequence on > the serial console? Usually buildworld runs from an ssh session > in the background with top display over it. The enter-tilda-control-B sequence is via the tty driver and kernel. It is not tied to a specific process. > I could run buildworld > under the debugger from the serial console if it makes any difference. > >> >> Are your kernel builds debug/invariants/witness builds? >> Is world a debug build? (I do not mean just having symbols >> and such as a debug build.) I wonder what the behavior would >> be for avoiding the resource overhead involved in having and >> using the debug code. (But, if it does fail, extracting >> information is normally a problem.) > > Sources are all unmodified, so it's whatever -current offers. > I'd expect that to include all three; there's explicit warning > that the witness option is enabled. === Mark Millard marklmi at yahoo.com