[Bug 275594] High CPU usage by arc_prune; analysis and fix

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 08 Dec 2023 10:17:49 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275594

--- Comment #6 from Seigo Tanimura <seigo.tanimura@gmail.com> ---
(In reply to Seigo Tanimura from comment #5)

The build under the following setting have completed:

- vfs.vnode.vnlru.max_free_per_call: 10000 (out-of-box)
- vfs.zfs.arc.prune_interval: 1000 (my fix enabled)

Build time: 07:11:02 (292 pkgs / hr)
Max vfs.vnode.stats.count: ~2.2M
Max ARC memory size: ~5.6GB

NB devel/ocl-icd failed because pkg-static was killed by the kernel for taking
too long to page in.  31 ports were skipped because of this failure.  This
error was often seen on 14.0-RELEASE-p0, indicating an obstacle upon the
executable file access.

This result is better than the baseline (14.0-RELEASE-p2) and worse than my
original fix shown in the description.  Although prune_interval avoided the
contention upon vnode_list_mtx somehow, this setup also limited the ARC pruning
performance, introducing another pressure including the overcommit upon the ARC
memory size.

I conclude this setup is not optimal nor recommended.

-----

Ongoing test:

- vfs.vnode.vnlru.max_free_per_call: 4000000 (==
vfs.vnode.vnlru.max_free_per_call)
- vfs.zfs.arc.prune_interval: 1000 (my fix enabled)

This setup allows the unlimited workload to the ARC pruning under the
configured interval.

Another object of this test is the measurement of the vnode number ZFS requests
the OS to reclaim.  As long as this value is below 100000
(vfs.vnode.vnlru.max_free_per_call in my first test), the system behaviour and
test results are expected to be the same as my first test.

A glance on 30 minutes after the build start:

- The activity of arc_prune is mostly the same as the first test; the CPU usage
occasionally surges up to 30%, but it does not stay for more than 1 second so
far.
- The average number of the vnodes ZFS requests to reclaim: ~44K.
  - vfs.vnode.stats.count: ~1.2M.
  - The default vfs.vnode.vnlru.max_free_per_call of 10K did regulate the ARC
pruning work.
  - I will keep my eyes on this figure, especially if it exceeds 100K.
- The ARC memory size is strictly regulated as configured by vfs.zfs.arc_max.
  - The ARC pruning starts when the ARC memory size reaches ~4.1GB.
  - The ARC pruning does not happen as long as the ARC memory size is below
4.0GB.

The finding regarding the ARC memory size is something new to me.  Maybe the
vnode number requested for the reclaim by ZFS is calculated very carefully and
precisely, so we should actually honour that figure to keep the system healthy.

I first treated this test as an extreme case, but maybe this should be
evaluated as a working setup.

-- 
You are receiving this mail because:
You are the assignee for the bug.