[Bug 290207] [ZFS] lowering "vfs.zfs.arc.max" to a low value causes kernel threads of "arc_evict" to use 91% CPU and disks to wait. System gets unresponsive...

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 02 Nov 2025 14:31:48 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=290207

--- Comment #26 from Nils Beyer <nbe@vkf-renzel.de> ---
(In reply to Mark Millard from comment #24)

First of all, thank you very much for your efforts and contributions to this
issue. :-)

I've converted your script into a one-liner:

    vmstat -z | awk -F'[,:]' '{if ($4!=0) print$0",\t"int($2*$4/1024)}' | sort
-t',' -nk9


Here are the results (last column; values are in kB):

before - right after reboot
===========================
malloc-128:             128,      0,   29935,    1964,  381802,   0,   0,   0, 
3741
mbuf:                   256,5648716,   16642,    2916,   18548,   0,   0,   0, 
4160
malloc-4096:           4096,      0,    1098,      26,   24290,   0,   0,   0, 
4392
zio_buf_comb_16384:   16384,      0,     555,      38,    2565,   0,   0,   0, 
8880
kstack_cache:         16384,      0,     575,      15,     664,   0,   0,   0, 
9200
zio_buf_comb_131072: 131072,      0,     172,      54,    1303,   0,   0,   0, 
22016
mbuf_cluster:          2048, 882612,   19686,     888,   19695,   0,   0,   0, 
39372
abd_chunk:             4096,      0,   14940,      78,   18338,   0,   0,   0, 
59760
vm pgcache:            4096,      0,   15841,    1021,  140076,   0,   0,   0, 
63364
vm pgcache:            4096,      0,   41224,     915,  192105,  42,   0,   0, 
164896


after - "rm -rf /usr/src.test"
==============================
dbuf_dirty_record_t:    376,      0,  123463,     167,  134164,   0,   0,   0, 
45334
zio_buf_comb_512:       512,      0,  118202,     214,  212376,   0,   0,   0, 
59101
vm pgcache:            4096,      0,   15844,    1018,  141199,   0,   0,   0, 
63376
zio_buf_comb_16384:   16384,      0,    4569,      31,   12057,   0,   0,   0, 
73104
dnode_t:                760,      0,  119418,      87,  119502,   0,   0,   0, 
88630
abd_chunk:             4096,      0,   28188,    6402,   41483,   0,   0,   0, 
112752
malloc-1024:           1024,      0,  121311,     113,  127955,   0,   0,   0, 
121311
zio_buf_comb_131072: 131072,      0,    1367,       0,    2535,   0,   0,   0, 
174976
zfs_btree_leaf_cache:  4096,      0,  118311,      14,  237236,   0,   0,   0, 
473244
vm pgcache:            4096,      0,  290377,     400,  442925,  42,   0,   0, 
1161508


So, "vm pgcache" took seven times of the wired memory than before. And
"zfs_btree_leaf_cache" also took a good amount.

Well, what is "vm_pgcache"? And why does it use that much memory just by
deleting a directory? Same question goes for "zfs_btree_leaf_cache".

Problem is that I have 12GB free at the moment:

    Mem: 34M Active, 42M Inact, 1882M Wired, 12G Free


But, when I try to get some memory of these caches back, it doesn't do that
fast enough (probably). Example by using TMPFS
------------------------------ SNIP ------------------------------
#mount -t tmpfs -o size=13G tmpfs /mnt/
#dd if=/dev/zero of=/mnt/test.dat bs=1M
dd: /mnt/test.dat: No space left on deviceerred 6.401s, 2167 MB/s   

13313+0 records in
13312+0 records out
13958643712 bytes transferred in 7.034921 secs (1984193397 bytes/sec)
------------------------------ SNIP ------------------------------


A 13GB TMPFS should get enough memory when freeing 1GB of the currently wired
memory of "vm_pgcache", right? But after fully writing to that TMPFS I get a
"pager error":

    uiomove_object: vm_obj 0xfffff800108b9d90 idx 3407872 pager error 5

And it started to swap:

    Mem: 41M Active, 1376K Inact, 84K Laundry, 579M Wired, 13G Free
    ARC: 109M Total, 32M MFU, 71M MRU, 256K Anon, 1330K Header, 3080K Other
         79M Compressed, 115M Uncompressed, 1.45:1 Ratio
    Swap: 16G Total, 7196K Used, 16G Free

-- 
You are receiving this mail because:
You are the assignee for the bug.