[Bug 275594] High CPU usage by arc_prune; analysis and fix

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 08 Feb 2024 03:32:05 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275594

--- Comment #44 from Seigo Tanimura <seigo.tanimura@gmail.com> ---
(In reply to Seigo Tanimura from comment #42)

* Fix Status

- Backport to releng/13.3.

Done locally.


- Stall upon low memory

When the vm_lowmem kernel event happens in the situation where the ARC pruning
cannot evict the sufficient ZFS vnodes, a pagedaemon thread may wait for the
ARC eviction indefinitely.  This causes the partial system stall.

Accelerate the ARC pruning in such the case.

The fix has been tested locally.


- Nullfs(5) node recycling

This is the fix targeting at poudriere-bulk(8).

Recycle the nullfs(5) vnodes not in use in the same way as the znodes, so that
the lower ZFS vnodes can be recycled as well.  The implementation is partly
shared with the accounting of the ZFS in-use znodes.

This has made a drastic improvement on the ZFS behaviour, including:
  - The ARC dnode size has reduced greatly; it no longer grows monotonically
during poudriere-bulk(8).
  - The ARC metadata and data now always have some evictable sizes.  At least,
they no longer fall to zero.
  - There are always some number of the prunable ZFS vnodes.

I believe this is how ZFS is supposed to work.

The fix has been tested locally.


- In-use counter overshoot and undershoot

An overshoot on the nullfs(5) in-use node counter (introduced for the nullfs(5)
node recycling) has been found.  This may cause a wraparound on the
vnlru_free_vfsops() argument and hence make an out-of-control behaviour.

The fix has been applied to nullfs(5) and ZFS.

The local test is in progress.


I will publish the updated git repo once the local test above completes.

Hope there are no more blockers...

-- 
You are receiving this mail because:
You are the assignee for the bug.