Getting v_wire_count from a kernel core dump?

From: Ken Merry <ken_at_freebsd.org>
Date: Tue, 21 Mar 2023 15:11:30 UTC
I have kernel core dumps from several machines out in the field (customer sites) that were out of memory panics, and I’m trying to figure out, from the kernel core dumps, whether we’re dealing with a potential page leak.

For context, these machines are running stable/13 from April 2021, but they do have the fix for this bug:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256507

Which is this commit in stable/13:

https://cgit.freebsd.org/src/commit/?id=6094749a1a5dafb8daf98deab23fc968070bc695

On a running system, I can get a rough idea whether there is a page leak by looking at the VM system page counters:

# sysctl vm.stats |grep count
vm.stats.vm.v_cache_count: 0
vm.stats.vm.v_user_wire_count: 0
vm.stats.vm.v_laundry_count: 991626
vm.stats.vm.v_inactive_count: 39733216
vm.stats.vm.v_active_count: 11821309
vm.stats.vm.v_wire_count: 11154113
vm.stats.vm.v_free_count: 1599981
vm.stats.vm.v_page_count: 65347213

So the first 5 numbers add up to 65300245 in this case, for a difference of 46968.  

Am I off base here as far as the various counts adding up to the page count?  (e.g. is the wire count just an additional attribute of a page and not another separate state like active, inactive or laundry?)

Looking at the kernel core dump for one of the systems I see:

kgdb) print vm_cnt
$1 = {v_swtch = 0xfffffe022158f2f8, v_trap = 0xfffffe022158f2f0,
  v_syscall = 0xfffffe022158f2e8, v_intr = 0xfffffe022158f2e0,
  v_soft = 0xfffffe022158f2d8, v_vm_faults = 0xfffffe022158f2d0,
  v_io_faults = 0xfffffe022158f2c8, v_cow_faults = 0xfffffe022158f2c0,
  v_cow_optim = 0xfffffe022158f2b8, v_zfod = 0xfffffe022158f2b0,
  v_ozfod = 0xfffffe022158f2a8, v_swapin = 0xfffffe022158f2a0,
  v_swapout = 0xfffffe022158f298, v_swappgsin = 0xfffffe022158f290,
  v_swappgsout = 0xfffffe022158f288, v_vnodein = 0xfffffe022158f280,
  v_vnodeout = 0xfffffe022158f278, v_vnodepgsin = 0xfffffe022158f270,
  v_vnodepgsout = 0xfffffe022158f268, v_intrans = 0xfffffe022158f260,
  v_reactivated = 0xfffffe022158f258, v_pdwakeups = 0xfffffe022158f250,
  v_pdpages = 0xfffffe022158f248, v_pdshortfalls = 0xfffffe022158f240,
  v_dfree = 0xfffffe022158f238, v_pfree = 0xfffffe022158f230,
  v_tfree = 0xfffffe022158f228, v_forks = 0xfffffe022158f220,
  v_vforks = 0xfffffe022158f218, v_rforks = 0xfffffe022158f210,
  v_kthreads = 0xfffffe022158f208, v_forkpages = 0xfffffe022158f200,
  v_vforkpages = 0xfffffe022158f1f8, v_rforkpages = 0xfffffe022158f1f0,
  v_kthreadpages = 0xfffffe022158f1e8, v_wire_count = 0xfffffe022158f1e0,
  v_page_size = 4096, v_page_count = 65342843, v_free_reserved = 85343,
  v_free_target = 1392195, v_free_min = 412056, v_inactive_target = 2088292,
  v_pageout_free_min = 136, v_interrupt_free_min = 8, v_free_severe = 248698}
(kgdb) print vm_ndomains
$2 = 4
(kgdb) print vm_dom[0].vmd_pagequeues[0].pq_cnt
$3 = 6298704
(kgdb) print vm_dom[0].vmd_pagequeues[1].pq_cnt
$4 = 3423939
(kgdb) print vm_dom[0].vmd_pagequeues[2].pq_cnt
$5 = 629834
(kgdb) print vm_dom[0].vmd_pagequeues[3].pq_cnt
$6 = 0
(kgdb) print vm_dom[1].vmd_pagequeues[0].pq_cnt
$7 = 2301793
(kgdb) print vm_dom[1].vmd_pagequeues[1].pq_cnt
$8 = 7130193
(kgdb) print vm_dom[1].vmd_pagequeues[2].pq_cnt
$9 = 701495
(kgdb) print vm_dom[1].vmd_pagequeues[3].pq_cnt
$10 = 0
(kgdb) print vm_dom[2].vmd_pagequeues[0].pq_cnt
$11 = 464429
(kgdb) print vm_dom[2].vmd_pagequeues[1].pq_cnt
$12 = 9123532
(kgdb) print vm_dom[2].vmd_pagequeues[2].pq_cnt
$13 = 1037423
(kgdb) print vm_dom[2].vmd_pagequeues[3].pq_cnt
$14 = 0
(kgdb) print vm_dom[3].vmd_pagequeues[0].pq_cnt
$15 = 5444946
(kgdb) print vm_dom[3].vmd_pagequeues[1].pq_cnt
$16 = 4466782
(kgdb) print vm_dom[3].vmd_pagequeues[2].pq_cnt
$17 = 785195
(kgdb) print vm_dom[3].vmd_pagequeues[3].pq_cnt
$18 = 0
(kgdb) 


Adding up the page queue counts:

6298704
3423939
629834
++p
10352477
2301793
7130193
701495
++p
10133481
+p
20485958
464429
9123532
1037423
++p
10625384
+p
31111342
5444946
4466782
785195
++p
10696923
+p
41808265

So, about 23M pages short of v_page_count.  

v_wire_count is a per-CPU counter, and on a running system it gets added up.  But trying to access it in the kernel core dump yields:

(kgdb) print vm_cnt.v_wire_count
$2 = (counter_u64_t) 0xfffffe022158f1e0
(kgdb) print *$2
Cannot access memory at address 0xfffffe022158f1e0

Anyone have any ideas whether I can figure out whether there is a page leak from the core dump?

Thanks,

Ken
— 
Ken Merry
ken@FreeBSD.ORG