Re: Unable to limit memory consumption with vfs.zfs.arc_max
Date: Mon, 15 Jul 2024 20:24:37 UTC
Picking up this old thread since it's still vexing me....
On Sat, May 04, 2024 at 07:56:39AM -0400, Dan Langille wrote:
>
> This is from FreeBSD 14 on an Dell R730 in the basement (primary purpose, poudriere, and PostgreSQL, and running four FreshPorts nodes):
>
> >From top:
>
> ARC: 34G Total, 14G MFU, 9963M MRU, 22M Anon, 1043M Header, 9268M Other
> 18G Compressed, 41G Uncompressed, 2.28:1 Ratio
>
> % grep arc /boot/loader.conf
> vfs.zfs.arc_max="36000M"
>
> Looks like the value to set is:
>
> % sysctl -a vfs.zfs.arc | grep max
> vfs.zfs.arc.max: 37748736000
>
> Perhaps not a good example, but this might be more appropriate:
>
> % grep vfs.zfs.arc.max /boot/loader.conf
> vfs.zfs.arc_max="1200M"
>
> with top showing:
>
> ARC: 1198M Total, 664M MFU, 117M MRU, 3141K Anon, 36M Header, 371M Other
> 550M Compressed, 1855M Uncompressed, 3.37:1 Ratio
Thank you, Dan, I appreciate you chiming in.
Unfortunately, I think I have those bases covered, although I'm open to
anything I may have missed:
# grep -i arc /boot/loader.conf /etc/sysctl.conf
/boot/loader.conf:vfs.zfs.arc.max=4294967296
/boot/loader.conf:vfs.zfs.arc_max=4294967296
/boot/loader.conf:vfs.zfs.arc.min=2147483648
/etc/sysctl.conf:vfs.zfs.arc_max=4294967296
/etc/sysctl.conf:vfs.zfs.arc.max=4294967296
/etc/sysctl.conf:vfs.zfs.arc.min=2147483648
# top -b
last pid: 16257; load averages: 0.80, 1.15, 1.18 up 0+02:03:34 12:05:06
55 processes: 2 running, 53 sleeping
CPU: 11.7% user, 0.0% nice, 18.4% system, 0.1% interrupt, 69.9% idle
Mem: 32M Active, 141M Inact, 11G Wired, 3958M Free
ARC: 10G Total, 5143M MFU, 4679M MRU, 2304K Anon, 44M Header, 219M Other
421M Compressed, 4744M Uncompressed, 11.28:1 Ratio
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11057 root 1 127 0 59M 33M CPU0 0 60:16 82.28% ssh
11056 root 5 24 0 22M 12M pipewr 3 6:00 6.25% zfs
1619 snmpd 1 20 0 34M 14M select 0 0:06 0.00% snmpd
1344 root 1 20 0 14M 3884K select 3 0:03 0.00% devd
1544 root 1 20 0 13M 2776K select 3 0:01 0.00% syslogd
1661 root 1 68 0 22M 9996K select 0 0:01 0.00% sshd
1587 ntpd 1 20 0 23M 5876K select 1 0:00 0.00% ntpd
14391 root 1 20 0 22M 11M select 3 0:00 0.00% sshd
2098 root 1 20 0 24M 11M select 1 0:00 0.00% httpd
1904 root 1 20 0 24M 11M select 2 0:00 0.00% httpd
1870 root 1 20 0 19M 8688K select 2 0:00 0.00% sendmail
2067 root 1 20 0 19M 8688K select 1 0:00 0.00% sendmail
2066 65529 1 20 0 13M 4564K select 2 0:00 0.00% mathlm
1883 65529 1 20 0 11M 2772K select 3 0:00 0.00% mathlm
14397 root 1 20 0 14M 4568K wait 1 0:00 0.00% bash
1636 root 1 20 0 13M 2608K nanslp 0 0:00 0.00% cron
2082 root 1 20 0 13M 2560K nanslp 3 0:00 0.00% cron
1887 root 1 20 0 13M 2568K nanslp 2 0:00 0.00% cron
# sysctl -a | grep m.u_evictable
kstat.zfs.misc.arcstats.mfu_evictable_metadata: 0
kstat.zfs.misc.arcstats.mfu_evictable_data: 0
kstat.zfs.misc.arcstats.mru_evictable_metadata: 0
kstat.zfs.misc.arcstats.mru_evictable_data: 0
An mrtg graph is attached showing ARC bytes used
(kstat.zfs.misc.arcstats.size) in green, vs. ARC bytes max
(vfs.zfs.arc.max) in blue. We can see that daily, the ARC bytes used
blows right past the 4G limit. Most days, it is brought under control
by two reboots in /etc/crontab ("shutdown -r now" at 02:55, 05:35),
although some days the system is too far gone by the time the cron job
rolls around, and the system stays hung until I can get to the data
center and power cycle it.
I'm not very skilled at kernel debugging, but is a kernel PR in order?
This has happened with a GENERIC kernel across at least two builds of
14-STABLE:
FreeBSD 14.0-STABLE #0 stable/14-n267062-77205dbc1397: Thu Mar 28 12:12:02 PDT 2024
FreeBSD 14.1-STABLE #0 stable/14-n267886-4987c12cb878: Thu Jun 6 12:24:06 PDT 2024
Would it help to reproduce this with a -RELEASE version?
Thank you again, everyone.
Jim