Belated out of swap kill on rpi3 at r359216
Mark Millard
marklmi at yahoo.com
Tue Mar 24 17:09:58 UTC 2020
On 2020-Mar-24, at 08:57, bob prohaska <fbsd at www.zefox.net> wrote:
> An attempt to buildworld on an rpi3 running r359216 stopped with an
> OOMA kill. Sources were at 359264, loader.conf contained
> vm.pageout_oom_seq="4096" .
What of the value of vm.pfault_oom_attempts ?
If vm.pfault_oom_attempts was not -1,
what of the value of vm.pfault_oom_wait ?
> A snipped of gstat log suggests the worst congestion in the storage I/O
> happened at Tue Mar 24 04:52:26 PDT 2020 with an L(q) of 37, but the
> OOMA kill happened at Tue Mar 24 04:53:04 PDT 2020, by which time the
> L(q) had dropped to one, half a minute later.
>
> Is the delay in OOMA action to be expected?
>
> Here's the relevant part of the log, I hope the columns display readably:
>
> 0/2/2/19177 mbuf clusters in use (current/cache/total/max)
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 0 0 29 1897320 47312 12851 9 4 5 13330 1669 0 0 14172 8020 3034 65 29 6
> dT: 1.056s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 37 367 323 5463 6.4 45 1511 76.8 0 0 0.0 91.7 mmcsd0
> 37 367 323 5463 6.5 45 1511 76.9 0 0 0.0 93.3 mmcsd0s2
> 34 133 111 3209 7.6 22 697 134.7 0 0 0.0 74.0 mmcsd0s2a
> 3 235 212 2254 5.9 23 814 21.5 0 0 0.0 70.0 mmcsd0s2b
> 34 133 111 3209 7.6 22 697 134.7 0 0 0.0 74.2 ufs/rootfs
> Tue Mar 24 04:52:26 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 224484 4179768 5%
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
How bad were things back when the "indefinate wait buffer" notices
were generated (Mar 24 04:20:50 time frame)?
> 0/0/0/19177 mbuf clusters in use (current/cache/total/max)
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 3 0 29 1936132 19232 12849 9 4 5 13328 1671 0 0 14173 8020 3036 65 29 6
> dT: 1.010s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 9 323 148 2531 20.7 175 3327 14.6 0 0 0.0 97.2 mmcsd0
> 9 323 148 2531 20.8 175 3327 14.6 0 0 0.0 98.4 mmcsd0s2
> 2 96 96 1869 16.9 0 0 0.0 0 0 0.0 82.9 mmcsd0s2a
> 7 227 51 661 28.3 175 3327 14.6 0 0 0.0 84.0 mmcsd0s2b
> 2 96 96 1869 16.9 0 0 0.0 0 0 0.0 83.0 ufs/rootfs
> Tue Mar 24 04:52:41 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 272928 4131324 6%
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
> 0/0/0/19177 mbuf clusters in use (current/cache/total/max)
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 0 0 29 1939940 48016 12849 9 4 5 13328 1676 948 0 14174 8019 3037 65 29 6
> dT: 1.001s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 3 371 151 1989 91.0 221 3847 14.0 0 0 0.0 100.1 mmcsd0
> 3 371 151 1989 91.1 221 3847 14.0 0 0 0.0 100.1 mmcsd0s2
> 1 41 41 415 74.5 0 0 0.0 0 0 0.0 304.8 mmcsd0s2a
> 2 331 110 1574 97.3 221 3847 14.0 0 0 0.0 99.3 mmcsd0s2b
> 1 41 41 415 74.5 0 0 0.0 0 0 0.0 304.9 ufs/rootfs
> Tue Mar 24 04:52:45 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 282480 4121772 6%
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
> 0/2/2/19177 mbuf clusters in use (current/cache/total/max)
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 2 0 29 1944652 42768 12849 9 4 5 13328 1678 0 0 14174 8019 3037 65 29 6
> dT: 1.010s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 2 498 498 6345 3.5 0 0 0.0 0 0 0.0 88.8 mmcsd0
> 2 498 498 6345 3.9 0 0 0.0 0 0 0.0 98.3 mmcsd0s2
> 1 152 152 1501 4.1 0 0 0.0 0 0 0.0 62.1 mmcsd0s2a
> 1 346 346 4844 3.9 0 0 0.0 0 0 0.0 88.8 mmcsd0s2b
> 1 152 152 1501 4.2 0 0 0.0 0 0 0.0 62.5 ufs/rootfs
> Tue Mar 24 04:52:47 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 282396 4121856 6%
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
> 0/0/0/19177 mbuf clusters in use (current/cache/total/max)
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 5 0 28 1980200 19676 12848 9 4 5 13327 1684 714 0 14175 8019 3039 65 29 6
> dT: 1.002s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 9 487 174 1449 26.2 313 2427 9.1 0 0 0.0 98.4 mmcsd0
> 9 487 174 1449 26.4 313 2427 9.1 0 0 0.0 100.0 mmcsd0s2
> 1 45 45 267 22.9 0 0 0.0 0 0 0.0 80.3 mmcsd0s2a
> 9 442 129 1182 27.7 313 2427 9.1 0 0 0.0 97.2 mmcsd0s2b
> 1 45 45 267 22.9 0 0 0.0 0 0 0.0 80.4 ufs/rootfs
> Tue Mar 24 04:53:01 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 309996 4094256 7%
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
> 0/2/2/19177 mbuf clusters in use (current/cache/total/max)
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 0 0 29 1983636 44400 12848 9 4 5 13326 1689 0 0 14175 8018 3040 65 29 6
> dT: 1.002s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 12 528 0 0 0.0 528 3797 7.5 0 0 0.0 100.0 mmcsd0
> 12 528 0 0 0.0 528 3797 7.5 0 0 0.0 100.0 mmcsd0s2
> 11 528 0 0 0.0 528 3797 7.5 0 0 0.0 100.0 mmcsd0s2b
> Tue Mar 24 04:53:05 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 322360 4081892 7%
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
> 3/769/772/19177 mbuf clusters in use (current/cache/total/max)
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 1 0 29 1826272 165204 12848 9 4 5 13326 1689 971 0 14176 8018 3040 65 29 6
> dT: 1.001s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 1 614 604 4661 4.9 10 168 40.6 0 0 0.0 93.8 mmcsd0
> 1 614 604 4661 4.9 10 168 40.6 0 0 0.0 94.5 mmcsd0s2
> 0 210 209 1243 4.1 1 32 181.0 0 0 0.0 83.1 mmcsd0s2a
> 1 404 395 3418 5.4 9 136 25.1 0 0 0.0 81.0 mmcsd0s2b
> 0 210 209 1243 4.1 1 32 181.0 0 0 0.0 83.6 ufs/rootfs
> Tue Mar 24 04:53:08 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 230064 4174188 5%
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
> Mar 24 04:53:04 www kernel: pid 96708 (c++), jid 0, uid 0, was killed: out of swap space
> 0/772/772/19177 mbuf clusters in use (current/cache/total/max)
> procs memory page disks faults cpu
> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
> 2 0 28 1705884 221144 12848 9 4 5 13326 1689 1109 0 14176 8018 3041 65 29 6
> dT: 1.002s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
> 0 545 545 3580 2.1 0 0 0.0 0 0 0.0 74.0 mmcsd0
> 0 545 545 3580 2.1 0 0 0.0 0 0 0.0 75.7 mmcsd0s2
> 0 288 288 1517 2.0 0 0 0.0 0 0 0.0 56.4 mmcsd0s2a
> 0 256 256 2064 2.4 0 0 0.0 0 0 0.0 49.0 mmcsd0s2b
> 0 288 288 1517 2.0 0 0 0.0 0 0 0.0 57.1 ufs/rootfs
> Tue Mar 24 04:53:10 PDT 2020
> Device 1K-blocks Used Avail Capacity
> /dev/mmcsd0s2b 4404252 230064 4174188 5%
> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096
> Mar 24 04:53:04 www kernel: pid 96708 (c++), jid 0, uid 0, was killed: out of swap space
>
Below I show code changes to be more explicit in the
output messaging about what contributes to initiating
OOM kills, without needing boot verbose or the like.
There are also some messages from Mark J.'s old code
for reporting on things related to initiating OOM
kills, or some minor variations of them.
You may want to try such changes to provide more
context for your OOM kills when they happen. Below
the 4 reasons reported are (showing the most
detailed of the related messages from different
stages):
swp_pager_meta_build: swap blk uma zone exhausted
swp_pager_meta_build: swap pctrie uma zone exhausted
vm_fault_allocate: proc %d (%s) failed to alloc page on fault, starting OOM
m_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u
The below is based on my head -r358966 context for
the detailed diff's.
# svnlite diff /usr/src/sys/vm/ | more
Index: /usr/src/sys/vm/swap_pager.c
===================================================================
--- /usr/src/sys/vm/swap_pager.c (revision 358966)
+++ /usr/src/sys/vm/swap_pager.c (working copy)
@@ -1998,6 +1998,7 @@
0, 1))
printf("swap blk zone exhausted, "
"increase kern.maxswzone\n");
+ printf("swp_pager_meta_build: swap blk uma zone exhausted\n");
vm_pageout_oom(VM_OOM_SWAPZ);
pause("swzonxb", 10);
} else
@@ -2028,6 +2029,7 @@
0, 1))
printf("swap pctrie zone exhausted, "
"increase kern.maxswzone\n");
+ printf("swp_pager_meta_build: swap pctrie uma zone exhausted\n");
vm_pageout_oom(VM_OOM_SWAPZ);
pause("swzonxp", 10);
} else
Index: /usr/src/sys/vm/vm_fault.c
===================================================================
--- /usr/src/sys/vm/vm_fault.c (revision 358966)
+++ /usr/src/sys/vm/vm_fault.c (working copy)
@@ -1074,9 +1074,8 @@
fs->oom++;
vm_waitpfault(dset, vm_pfault_oom_wait * hz);
} else {
- if (bootverbose)
- printf(
- "proc %d (%s) failed to alloc page on fault, starting OOM\n",
+ printf(
+ "vm_fault_allocate: proc %d (%s) failed to alloc page on fault, starting OOM\n",
curproc->p_pid, curproc->p_comm);
vm_pageout_oom(VM_OOM_MEM_PF);
fs->oom = 0;
Index: /usr/src/sys/vm/vm_page.c
===================================================================
--- /usr/src/sys/vm/vm_page.c (revision 358966)
+++ /usr/src/sys/vm/vm_page.c (working copy)
@@ -3134,6 +3134,7 @@
* race-free vm_wait_domain().
*/
if (curproc == pageproc) {
+ printf("thread %d waiting for memory\n", curthread->td_tid);
mtx_lock(&vm_domainset_lock);
vm_pageproc_waiters++;
msleep(&vm_pageproc_waiters, &vm_domainset_lock, PVM | PDROP,
Index: /usr/src/sys/vm/vm_pageout.c
===================================================================
--- /usr/src/sys/vm/vm_pageout.c (revision 358966)
+++ /usr/src/sys/vm/vm_pageout.c (working copy)
@@ -1741,6 +1741,8 @@
* start OOM. Initiate the selection and signaling of the
* victim.
*/
+ printf("vm_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u\n",
+ vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
vm_pageout_oom(VM_OOM_MEM);
/*
@@ -1933,10 +1935,24 @@
}
sx_sunlock(&allproc_lock);
if (bigproc != NULL) {
+ char *reason_text;
+ switch (shortage) {
+ case VM_OOM_MEM_PF:
+ reason_text= "fault's page allocation failed";
+ break;
+ case VM_OOM_MEM:
+ reason_text= "free RAM stayed below threshold";
+ break;
+ case VM_OOM_SWAPZ:
+ reason_text= "swblk or swpctrie zone exhausted";
+ break;
+ default:
+ reason_text= "Unknown Reason";
+ }
if (vm_panic_on_oom != 0 && --vm_panic_on_oom == 0)
- panic("out of swap space");
+ panic("%s",reason_text);
PROC_LOCK(bigproc);
- killproc(bigproc, "out of swap space");
+ killproc(bigproc, reason_text);
sched_nice(bigproc, PRIO_MIN);
_PRELE(bigproc);
PROC_UNLOCK(bigproc);
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-arm
mailing list