Re: Odd "swp_pager_getswapspace(??): failed"s happen during bulk -Ca for RAM+SWAP=704 GiBytes
Date: Sun, 27 Jul 2025 07:33:16 UTC
On Jul 23, 2025, at 01:42, Mark Millard <marklmi@yahoo.com> wrote: > In a context with RAM+SWAP = 704 GiBytes (192 GiBytes being RAM, > 512 GiBytes being SWAP) doing poudriere bulk -Ca builds at some > point ends up with reports like: > > swp_pager_getswapspace(22): failed > > and: > > was killed: failed to reclaim memory > > for 12 builders, MAKE_JOBS_NUMBER=3 , TMPFS_BLACKLIST > in use, 32 FreeBSD cpus, etc. > > For example: > > . . . > Jul 22 10:17:27 7950X3D-ZFS kernel: pid 62915 (scc_16815), jid 780, uid 0: exited on signal 11 (core dumped) > Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to DOWN > Jul 22 21:38:10 7950X3D-ZFS kernel: ue0: link state changed to UP > Jul 22 21:38:29 7950X3D-ZFS kernel: swap_pager: out of swap space > Jul 22 21:38:29 7950X3D-ZFS kernel: swp_pager_getswapspace(22): failed > Jul 22 21:39:11 7950X3D-ZFS kernel: pid 15059 (dot), jid 780, uid 0, was killed: failed to reclaim memory > Jul 22 21:43:38 7950X3D-ZFS kernel: swap_pager: out of swap space > Jul 22 21:43:38 7950X3D-ZFS kernel: swp_pager_getswapspace(14): failed > Jul 22 21:44:04 7950X3D-ZFS kernel: pid 15049 (dot), jid 780, uid 0, was killed: failed to reclaim memory > Jul 22 21:56:39 7950X3D-ZFS kernel: swap_pager: out of swap space > Jul 22 21:56:39 7950X3D-ZFS kernel: swp_pager_getswapspace(15): failed > Jul 22 21:57:12 7950X3D-ZFS kernel: pid 15045 (dot), jid 780, uid 0, was killed: failed to reclaim memory > > I've not figured out a way to track down such messages > back to the relevant log file for the builds that were > killed. Neither the pid, nor the jid appear in > the log files. Similarly, nothing in /var/log/messages > identifies the poudriere Job Id or other such. > > (I've never happened to be actively monitoring when > the issue happened. So I've always ended up looking at > it after the fact.) > > It would be nice to be able to identify what specific > packages to try to rebuild for these --and to investigate > why the SWAP usage that had stayed under 2 GiByte ended > up reaching 512 GiBytes during that period. A panic from the activity during another bulk -Ca test lead to the dump providing enough context to track down the package that was being built that got the issue and what is was running that, in turn, has the problem memory usage: [2D:01:22:29] [06] [00:00:00] Building graphics/sdl2_gpu | sdl2_gpu-0.12.0 was using: UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND . . . 0 79229 40923 4 59 0 23524 4148 wait D - 0:00.00 [sh] 0 79230 79229 5 59 0 14208 172 wait Ds - 0:00.01 [make] 0 79233 79230 4 59 0 14668 176 wait D - 0:00.00 [sh] 0 79234 79233 5 59 0 14668 176 wait D - 0:00.00 [sh] 0 79235 79234 12 0 0 16284 356 select D - 0:00.01 [ninja] 0 79236 79235 28 59 0 223048 1052 uwait D - 0:00.44 [doxygen] 0 79272 79236 25 59 0 157589964 41424308 pfault D - 3:25.33 [dot] 0 79279 79236 31 59 0 157601740 41513520 pfault D - 3:23.41 [dot] 0 79289 79236 14 59 0 157589964 41361600 pfault D - 3:22.72 [dot] 0 79301 79236 18 49 0 157667276 41208476 pfault D - 3:24.32 [dot] . . . Part of the context was the /06/ text in: . . . root dot 79301 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r root dot 79289 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r . . . root dot 79279 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r . . . root dot 79272 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r . . . root doxygen 79236 0 /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev 20 crw-rw-rw- null r . . . It identifies the [06] builder and the "Building" notice had made it to the disk before the panic happened. Then I could check the Makefile for if doxygen was used and it was. graphics/sdl2_gp historical build logs suggest problems exist. === Mark Millard marklmi at yahoo.com