Re: A panic by vm_pageout_scan_active activity, some details in case they might help

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 27 Jul 2025 23:23:32 UTC
On Jul 27, 2025, at 15:25, Mark Millard <marklmi@yahoo.com> wrote:

> On Jul 27, 2025, at 15:00, Mark Johnston <markj@freebsd.org> wrote:
>> 
>> On Sun, Jul 27, 2025 at 02:26:29PM -0700, Mark Millard wrote:
>>> I tried a poudriere(-devel) bulk -Ca on the amd64 system that
>>> I have access to and a package build used up much of the
>>> RAM+SWAP == 704 GiBytes before a panic happened. Past examples
>>> OOM'd without panics, although I did not know the context until
>>> examining this crash dump.
>> 
>> What is the panic string?
> 
> The picture I took shows:
> 
> Fatal Trap 12: page fault while in kernel mode
> 
> # more /var/crash/info.4
> Dump header from device: /dev/gpt/OptBswp364
>  Architecture: amd64
>  Architecture Version: 2
>  Dump Length: 20258381824
>  Blocksize: 512
>  Compression: none
>  Dumptime: 2025-07-26 18:56:16 -0700
>  Hostname: 7950X3D-ZFS
>  Magic: FreeBSD Kernel Dump
>  Version String: FreeBSD 15.0-CURRENT main-n278320-3a33e39edd48 GENERIC-NODEBUG
>  Panic String: page fault
>  Dump Parity: 668710208
>  Bounds: 4
>  Dump Status: good
> 
>> Could you please open a report on bugzilla
>> and include the full core.txt.4?
> 
> Okay.

Done:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288507

>>> # uname -apKU
>>> FreeBSD 7950X3D-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT main-n278320-3a33e39edd48 GENERIC-NODEBUG amd64 amd64 1500048 1500048
>>> 
>>> That is an official PkgBase installation of the boot-kernel and
>>> boot-world, not a personal build.
>>> 
>>> The dump materials had references for doxygen and for dot to :
>>> 
>>> /usr/local/poudriere/data/.m/main-ZNV4-bulk_a-alt/06/dev
>>> 
>>> that let me track this to the [06] builder running at the time
>>> of the crash:
>>> 
>>> [2D:01:22:29] [06] [00:00:00] Building   graphics/sdl2_gpu | sdl2_gpu-0.12.0
>>> 
>>> It was running doxygen, which in turn was running mulitple dot's.
>>> 
>>> From /var/crash/core.txt.4 :
>>> 
>>> UID   PID  PPID  C PRI NI       VSZ      RSS MWCHAN   STAT TT          TIME COMMAND
>>> . . .
>>> 0 79229 40923  4  59  0     23524     4148 wait     D     -       0:00.00 [sh]
>>> 0 79230 79229  5  59  0     14208      172 wait     Ds    -       0:00.01 [make]
>>> 0 79233 79230  4  59  0     14668      176 wait     D     -       0:00.00 [sh]
>>> 0 79234 79233  5  59  0     14668      176 wait     D     -       0:00.00 [sh]
>>> 0 79235 79234 12   0  0     16284      356 select   D     -       0:00.01 [ninja]
>>> 0 79236 79235 28  59  0    223048     1052 uwait    D     -       0:00.44 [doxygen]
>>> 0 79272 79236 25  59  0 157589964 41424308 pfault   D     -       3:25.33 [dot]
>>> 0 79279 79236 31  59  0 157601740 41513520 pfault   D     -       3:23.41 [dot]
>>> 0 79289 79236 14  59  0 157589964 41361600 pfault   D     -       3:22.72 [dot]
>>> 0 79301 79236 18  49  0 157667276 41208476 pfault   D     -       3:24.32 [dot]
>>> . . .
>>> 
>>> . . .
>>> #14 <signal handler called>
>>> No locals.
>>> #15 vm_pageout_scan_active (vmd=0xffffffff81c22380 <vm_dom>,
>>>   page_shortage=102849)
>>>   at /home/pkgbuild/worktrees/main/sys/vm/vm_pageout.c:1264
>>>       ss = {bq = {bq_pa = {0xfffffe0030a1e500, 0xfffffe00a8798110,
>>>             0xfffffe00e3083e30, 0xfffffe00a47a4228, 0xfffffe002b6d8ef8,
>>>             0xfffffe0065cf29a0, 0xfffffe007a1b83b8, 0xfffffe008cf7b3c0,
>>>             0xfffffe005cd565e0, 0xfffffe0048ced5d8, 0xfffffe00c761d488,
>>>             0xfffffe008a5efe90, 0xfffffe00cf341738, 0xfffffe00413f97b8,
>>>             0xfffffe005270cc68, 0xfffffe00a5d9d690, 0xfffffe00294329e0,
>>>             0xfffffe005ef52f00, 0xfffffe0020dff308, 0xfffffe00ce1e9a40,
>>>             0xfffffe007ec47618, 0xfffffe005d1ba7e8, 0xfffffe0032d73470,
>>>             0xfffffe0030835e88, 0xfffffe009969c438, 0xfffffe00f151b0c8,
>>>             0xfffffe0063916fe8, 0xfffffe00dac0b778, 0xfffffe0016267348,
>>>             0xfffffe00b74a5fe0, 0xfffffe003434ef80, 0xfffffe009e31e840,
>>>             0xfffffe00530f6408, 0xfffffe00e0649508, 0xfffffe0102e87ad8,
>>>             0xfffffe0092c52848, 0xfffffe00ba829618, 0xfffffe008bf0fd10,
>>>             0xfffffe00550708c0, 0xfffffe00eedc67b8, 0xfffffe00d45f8210,
>>>             0xfffffe00b89a8698, 0xfffffe0082ffb310, 0xfffffe00accd53c0,
>>>             0xfffffe0091c8f5d8, 0xfffffe004e20f180, 0xfffffe004dfb4f90,
>>>             0xfffffe00a437fbb0, 0xfffffe00218cb698, 0xfffffe004ee5d278,
>>>             0xfffffe00a9e845a0, 0xfffffe0025d4a7c8, 0xfffffe0037612ac8,
>>>             0xfffffe005c7d3da8, 0xfffffe00d307c1b8, 0xfffffe00ee416538,
>>>             0xfffffe0043747508, 0xfffffe00ef30b508, 0xfffffe00c04de600,
>>>             0xfffffe008c0e3040, 0xfffffe0071a97b40, 0xfffffe005b644ad8,
>>>             0xfffffe00dd5da3b0}, bq_cnt = 39},
>>>         pq = 0xffffffff81c22400 <vm_dom+128>,
>>>         marker = 0xffffffff81c22778 <vm_dom+1016>, maxscan = 37165731,
>>>         scanned = 15440544}
>>>       marker = 0xffffffff81c22778 <vm_dom+1016>
>>>       pq = 0xffffffff81c22400 <vm_dom+128>
>>>       old = <optimized out>
>>>       scan_tick = <optimized out>
>>>       min_scan = <optimized out>
>>>       m = 0xfffffe00eedc67b8
>>>       object = 0x2b6c70f000
>>>       refs = <optimized out>
>>>       new = <optimized out>
>>>       ps_delta = <optimized out>
>>>       act_delta = <optimized out>
>>>       max_scan = <optimized out>
>>>       nqueue = <optimized out>
>>>       _v = <optimized out>
>>>       _tid = <optimized out>
>>>       _v = <optimized out>
>>>       _tid = <optimized out>
>>>       _v = <optimized out>
>>>       _v = <optimized out>
>>>       _tid = <optimized out>
>>>       _v = <optimized out>
>>> . . .
>>> 
>>> From the /usr/src/sys/ for the PkgBase installation in use, there is in
>>> vm_pageout_scan_active :
>>> 
>>> /home/pkgbuild/worktrees/main/sys/vm/vm_pageout.c: unmodified, readonly: line 1264 of 2416 [52%]
>>> 
>>>               /*
>>>                * Check to see "how much" the page has been used.
>>>                *
>>>                * Test PGA_REFERENCED after calling pmap_ts_referenced() so
>>>                * that a reference from a concurrently destroyed mapping is
>>>                * observed here and now.
>>>                *
>>>                * Perform an unsynchronized object ref count check.  While
>>>                * the page lock ensures that the page is not reallocated to
>>>                * another object, in particular, one with unmanaged mappings
>>>                * that cannot support pmap_ts_referenced(), two races are,
>>>                * nonetheless, possible:
>>>                * 1) The count was transitioning to zero, but we saw a non-
>>>                *    zero value.  pmap_ts_referenced() will return zero
>>>                *    because the page is not mapped.
>>>                * 2) The count was transitioning to one, but we saw zero.
>>>                *    This race delays the detection of a new reference.  At
>>>                *    worst, we will deactivate and reactivate the page.
>>>                */
>>>               refs = object->ref_count != 0 ? pmap_ts_referenced(m) : 0;
>>> 
>>> I am unlikely to be able to replicate the panic.
>>> 
>>> I hope that this is of some use.
>>> 
>>> Note:
>>> 
>>> I linked /home/pkgbuild/worktrees/main/sys to
>>> /usr/sys/src so that such paths work in my
>>> context.
>> 


===
Mark Millard
marklmi at yahoo.com