Re: Unable to limit memory consumption with vfs.zfs.arc_max

From: Jim Long <freebsd-questions_at_umpquanet.com>
Date: Mon, 15 Jul 2024 20:24:37 UTC
Picking up this old thread since it's still vexing me....

On Sat, May 04, 2024 at 07:56:39AM -0400, Dan Langille wrote:
> 
> This is from FreeBSD 14 on an Dell R730 in the basement (primary purpose, poudriere, and PostgreSQL, and running four FreshPorts nodes):
> 
> >From top:
> 
> ARC: 34G Total, 14G MFU, 9963M MRU, 22M Anon, 1043M Header, 9268M Other
>      18G Compressed, 41G Uncompressed, 2.28:1 Ratio
> 
> % grep arc /boot/loader.conf
> vfs.zfs.arc_max="36000M"
> 
> Looks like the value to set is:
> 
> % sysctl -a vfs.zfs.arc | grep max
> vfs.zfs.arc.max: 37748736000
> 
> Perhaps not a good example, but this might be more appropriate:
> 
> % grep vfs.zfs.arc.max /boot/loader.conf
> vfs.zfs.arc_max="1200M"
> 
> with top showing:
> 
> ARC: 1198M Total, 664M MFU, 117M MRU, 3141K Anon, 36M Header, 371M Other
>      550M Compressed, 1855M Uncompressed, 3.37:1 Ratio

Thank you, Dan, I appreciate you chiming in.

Unfortunately, I think I have those bases covered, although I'm open to
anything I may have missed:

# grep -i arc /boot/loader.conf /etc/sysctl.conf 
/boot/loader.conf:vfs.zfs.arc.max=4294967296
/boot/loader.conf:vfs.zfs.arc_max=4294967296
/boot/loader.conf:vfs.zfs.arc.min=2147483648
/etc/sysctl.conf:vfs.zfs.arc_max=4294967296
/etc/sysctl.conf:vfs.zfs.arc.max=4294967296
/etc/sysctl.conf:vfs.zfs.arc.min=2147483648

# top -b
last pid: 16257;  load averages:  0.80,  1.15,  1.18  up 0+02:03:34    12:05:06
55 processes:  2 running, 53 sleeping
CPU: 11.7% user,  0.0% nice, 18.4% system,  0.1% interrupt, 69.9% idle
Mem: 32M Active, 141M Inact, 11G Wired, 3958M Free
ARC: 10G Total, 5143M MFU, 4679M MRU, 2304K Anon, 44M Header, 219M Other
     421M Compressed, 4744M Uncompressed, 11.28:1 Ratio

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
11057 root          1 127    0    59M    33M CPU0     0  60:16  82.28% ssh
11056 root          5  24    0    22M    12M pipewr   3   6:00   6.25% zfs
 1619 snmpd         1  20    0    34M    14M select   0   0:06   0.00% snmpd
 1344 root          1  20    0    14M  3884K select   3   0:03   0.00% devd
 1544 root          1  20    0    13M  2776K select   3   0:01   0.00% syslogd
 1661 root          1  68    0    22M  9996K select   0   0:01   0.00% sshd
 1587 ntpd          1  20    0    23M  5876K select   1   0:00   0.00% ntpd
14391 root          1  20    0    22M    11M select   3   0:00   0.00% sshd
 2098 root          1  20    0    24M    11M select   1   0:00   0.00% httpd
 1904 root          1  20    0    24M    11M select   2   0:00   0.00% httpd
 1870 root          1  20    0    19M  8688K select   2   0:00   0.00% sendmail
 2067 root          1  20    0    19M  8688K select   1   0:00   0.00% sendmail
 2066  65529        1  20    0    13M  4564K select   2   0:00   0.00% mathlm
 1883  65529        1  20    0    11M  2772K select   3   0:00   0.00% mathlm
14397 root          1  20    0    14M  4568K wait     1   0:00   0.00% bash
 1636 root          1  20    0    13M  2608K nanslp   0   0:00   0.00% cron
 2082 root          1  20    0    13M  2560K nanslp   3   0:00   0.00% cron
 1887 root          1  20    0    13M  2568K nanslp   2   0:00   0.00% cron

# sysctl -a | grep m.u_evictable
kstat.zfs.misc.arcstats.mfu_evictable_metadata: 0
kstat.zfs.misc.arcstats.mfu_evictable_data: 0
kstat.zfs.misc.arcstats.mru_evictable_metadata: 0
kstat.zfs.misc.arcstats.mru_evictable_data: 0

An mrtg graph is attached showing ARC bytes used
(kstat.zfs.misc.arcstats.size) in green, vs. ARC bytes max
(vfs.zfs.arc.max) in blue.  We can see that daily, the ARC bytes used
blows right past the 4G limit.  Most days, it is brought under control
by two reboots in /etc/crontab ("shutdown -r now" at 02:55, 05:35),
although some days the system is too far gone by the time the cron job
rolls around, and the system stays hung until I can get to the data
center and power cycle it.

I'm not very skilled at kernel debugging, but is a kernel PR in order?
This has happened with a GENERIC kernel across at least two builds of
14-STABLE:

FreeBSD 14.0-STABLE #0 stable/14-n267062-77205dbc1397: Thu Mar 28 12:12:02 PDT 2024
FreeBSD 14.1-STABLE #0 stable/14-n267886-4987c12cb878: Thu Jun  6 12:24:06 PDT 2024

Would it help to reproduce this with a -RELEASE version?


Thank you again, everyone.

Jim