Re: Arm v7 RPi2 -current unresponsive to debugger escape during buildworld
- Reply: Mark Millard : "Re: Arm v7 RPi2 -current unresponsive to debugger escape during buildworld"
- Reply: Paul Mather : "Re: Arm v7 RPi2 -current unresponsive to debugger escape during buildworld"
- In reply to: Mark Millard : "Re: Arm v7 RPi2 -current unresponsive to debugger escape during buildworld"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 07 Nov 2025 02:22:37 UTC
On Thu, Nov 06, 2025 at 10:00:19AM -0800, Mark Millard wrote: > On Nov 6, 2025, at 08:38, bob prohaska <fbsd@www.zefox.net> wrote: > > > On Thu, Nov 06, 2025 at 03:45:01PM +0100, Ronald Klop wrote: > >> Hi, > >> > >> To me it sounds like your machine is overwhelmed by swapping. > >> > >> Try -j1 buildworld. Maybe a -j1 buildworld could be at least somewhat informative. Lately none of my Pi2's has made it through buildworld without hanging silently. If -j1 buildworld completes, that would be a significant change. The test will take a week, but the problem has been going on for a year. > > > > In most cases of stoppage the swap use is low, 50 MB or sometimes less. > > Up to about 6-700MB the machines slow their progress, but keep going and > > there are no complaints on the console about swap taking too long or > > insufficient. If there's a connection to swap use, it isn't obvious. > > > > It seems to be related more to hours of runtime than swap use. > > > > More to the point of my question, if the machine is swap-bound, > > shouldn't the debugger escape still work? > > > Are your descriptions of the lack of gaining control for use > of the serial console? Do you also have ssh or such? Do all > such see hangs as hung-up/crashed? All comms become unresponsive, serial console or ssh. > Do you get notices about > loss of network connections to the RPi2 v1.1 in question? Sometimes, but not always. Occasionally an ssh session will become unresponsive and only later report a disconnection. > Do any of those happen automatically? If so, the time > of such a message could put a bound on when the RPi2 v1.1 > hang-up/deadlocked/crashed, the message about failing > communication having occurred after the problem starts on > the RPi2 v1.1. In some cases the stuck ssh sessions are disconnected only after reboot completes. In others, it appears to be a matter of time. Overnight is usually sufficient. > I'll note that your prior reporting of the end-of-log > content gives evidence of things that completed, including > being flushed to the disk. But there likely was more that > was not flushed to the disk, some of which may have > otherwise completed. Also, what was actually active at the > time of the potential deadlock (or other form of crash) is > unlikely to show in the logs with such a known status. > In a lot of cases there's been a top session with a timestamp and swap usage running at the time of the crash. I've not made careful comparisions. That's the only timestamping at hand. > The I/O tries to keep the file system media content from being > corrupted, but not necessarily that it is up to date. (Fully > attempting both leads to either a contradiction or horrible > performance. UFS has different tradeoffs than ZFS for such > issues but the same general goal applies to both. At least > that is how I'd summarize it.) > > Knowing where the logs stop can give some idea what might > follow or have been active, but it involves other analysis. > > I do not know if tail -f reports buffered information vs. > only data that makes it to media. It might be that tail -f > in an ssh session on the/a log file might report closer > to the failure time, showing information that does not > make it to the media. That need not be the same as showing > the actual failure time: just possibly closer. > > > As for debugger use, there are thousands of processes. > If you mean gdb or lldb, there is no uniquely relevant > process to attach to and monitor that survives across all > the activity. Would running the buildworld command under a debugger's control give any better access to the enter-tilda-control-B sequence on the serial console? Usually buildworld runs from an ssh session in the background with top display over it. I could run buildworld under the debugger from the serial console if it makes any difference. > > Are your kernel builds debug/invariants/witness builds? > Is world a debug build? (I do not mean just having symbols > and such as a debug build.) I wonder what the behavior would > be for avoiding the resource overhead involved in having and > using the debug code. (But, if it does fail, extracting > information is normally a problem.) Sources are all unmodified, so it's whatever -current offers. I'd expect that to include all three; there's explicit warning that the witness option is enabled. Thanks for writing! bob prohaska