[Bug 275594] High CPU usage by arc_prune; analysis and fix
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 24 Jan 2024 10:47:34 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275594
--- Comment #36 from Seigo Tanimura <seigo.tanimura@gmail.com> ---
(In reply to Thomas Mueller from comment #34)
I have backported the fix to stable/13 (13.3-PRERELEASE) and tested
poudriere-bulk(8).
The fix has also been applied to the main and stable/14 branches without any
changes.
Thomas, would you mind testing the backported fix to see if poudriere's build
time changes in any way?
* Sources on GitHub:
- Repo
- https://github.com/altimeter-130ft/freebsd-freebsd-src
- Branches
- main (Current)
- Fix only
- topic-openzfs-arc_prune-regulation-fix
- Fix and counters
- topic-openzfs-arc_prune-regulation-counters
- No changes from the fix on 14.0.0-RELEASE-p2.
- stable/14 (14-STABLE)
- Fix only
- stable/14-topic-openzfs-arc_prune-regulation-fix
- Fix and counters
-
release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-regulation-counters
- No changes from the fix on 14.0.0-RELEASE-p2.
- releng/14.0 (14.0-RELEASE)
- Fix only
- release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-regulation-fix
- Fix and counters
-
release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-regulation-counters
- The original fix branches.
- stable/13 (13-STABLE / 13.3-PRERELEASE)
- Fix only
- stable/13-topic-openzfs-arc_prune-regulation-fix
- Fix and counters
- stable/13-topic-openzfs-arc_prune-regulation-counters
- Backported changes
- A fix equivalent to FreeBSD-EN-23:18.openzfs.
- The ARC pruning task pileup is avoided by a single flag and the
atomic operations on it.
- Seigo's fix.
- The ZFS vnode accounting, including the counters.
- The ARC pruning regulation.
- The improvement on vnlru_free_impl()
- Changes not backported
- Seigo's fix.
- The counters regarding to the autotuning of ZFS ARC meta, the
balancing parameter of the ARC data and metadata.
- Those counters have changed significantly between 13-STABLE and
14-STABLE.
* Test results
Test Summary:
- Branch: stable/13-topic-openzfs-arc_prune-regulation-counters
- Date: 24 Jan 2024 00:10Z - 24 Jan 2024 05:59Z
- Build time: 05:48:30 (367 pkgs / hr)
- Failed port(s): 2
- Skipped port(s): 2
- Setup
- sysctl(3)
- vfs.zfs.arc_max: 4294967296
- 4GB.
- vfs.zfs.arc.dnode_limit=8080000000
- 2.5 * (vfs.vnode.param.limit) * sizeof(dnode_t)
- 2.5: experimental average dnodes per znode (2.0) + margin (0.5)
- poudriere-bulk(8)
- USE_TMPFS="wrkdir data localbase"
Result Chart Archive: (poudriere-bulk-13_3_prerelease-2024-01-24_09h10m00s.7z,
Attachment #247921)
- zfs-znodes-and-dnodes.png
- The counts of the ZFS znodes and dnodes.
- zfs-arc-pruning-regulation.png
- The counts of the ARC prune triggers by ZFS and the skips by the fix.
- zfs-dnodes-and-freeing-activity.png
- The freeing activity of the ZFS znodes and dnodes.
- vnode-free-calls.png
- The calls to the ZFS vnode freeing functions.
* Findings and Analysis
- The build time was shorter than 14.0-RELEASE because emulators/mame, started
in 4.5 hours, benefitted from ccache and completed in just 10 minutes. That
does not work on 14.0-RELEASE and all sources have to be rebuilt.
- If the emulators/mame build did not use ccache, its build would take ~2.5
hours and the whole poudriere-bulk(8) would complete in ~7 hours. This is the
same time as 14.0-RELEASE.
- No ARC pruning happened during poudriere-bulk(8).
- The only one pruning happened while settling down the system before
poudriere-bulk(8).
- On OpenZFS 2.1, the ARC pruning is not triggered by the excess unevictable
size in the ARC.
- Above works on OpenZFS 2.2 in 14-STABLE.
- Only the overcommitted dnodes and metadata size trigger the ARC on
OpenZFS 2.1.
- vfs.zfs.arc.dnode_limit in my setup effectively disabled the ARC pruning on
OpenZFS 2.1.
- Maybe this should be reverted to the default and retested.
- The zfskern{arc_evict} thread used the CPU up to 100% in the final ~1 hour of
the build.
- The reason is not clear.
- There were no significant affects to the system.
--
You are receiving this mail because:
You are the assignee for the bug.