[Bug 275594] High CPU usage by arc_prune; analysis and fix

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 24 Jan 2024 10:47:34 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275594

--- Comment #36 from Seigo Tanimura <seigo.tanimura@gmail.com> ---
(In reply to Thomas Mueller from comment #34)

I have backported the fix to stable/13 (13.3-PRERELEASE) and tested
poudriere-bulk(8).

The fix has also been applied to the main and stable/14 branches without any
changes.

Thomas, would you mind testing the backported fix to see if poudriere's build
time changes in any way?

* Sources on GitHub:

- Repo
  - https://github.com/altimeter-130ft/freebsd-freebsd-src
- Branches
  - main (Current)
    - Fix only
      - topic-openzfs-arc_prune-regulation-fix
    - Fix and counters
      - topic-openzfs-arc_prune-regulation-counters
    - No changes from the fix on 14.0.0-RELEASE-p2.
  - stable/14 (14-STABLE)
    - Fix only
      - stable/14-topic-openzfs-arc_prune-regulation-fix
    - Fix and counters
      -
release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-regulation-counters
    - No changes from the fix on 14.0.0-RELEASE-p2.
  - releng/14.0 (14.0-RELEASE)
    - Fix only
      - release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-regulation-fix
    - Fix and counters
      -
release/14.0.0/release-14_0_0-p2-topic-openzfs-arc_prune-regulation-counters
    - The original fix branches.
  - stable/13 (13-STABLE / 13.3-PRERELEASE)
    - Fix only
      - stable/13-topic-openzfs-arc_prune-regulation-fix
    - Fix and counters
      - stable/13-topic-openzfs-arc_prune-regulation-counters
    - Backported changes
      - A fix equivalent to FreeBSD-EN-23:18.openzfs.
        - The ARC pruning task pileup is avoided by a single flag and the
atomic operations on it.
      - Seigo's fix.
        - The ZFS vnode accounting, including the counters.
        - The ARC pruning regulation.
        - The improvement on vnlru_free_impl()
    - Changes not backported
      - Seigo's fix.
        - The counters regarding to the autotuning of ZFS ARC meta, the
balancing parameter of the ARC data and metadata.
          - Those counters have changed significantly between 13-STABLE and
14-STABLE.

* Test results

Test Summary:

- Branch: stable/13-topic-openzfs-arc_prune-regulation-counters
- Date: 24 Jan 2024 00:10Z - 24 Jan 2024 05:59Z
- Build time: 05:48:30 (367 pkgs / hr)
- Failed port(s): 2
- Skipped port(s): 2
- Setup
  - sysctl(3)
    - vfs.zfs.arc_max: 4294967296
      - 4GB.
    - vfs.zfs.arc.dnode_limit=8080000000
      - 2.5 * (vfs.vnode.param.limit) * sizeof(dnode_t)
        - 2.5: experimental average dnodes per znode (2.0) + margin (0.5)
  - poudriere-bulk(8)
    - USE_TMPFS="wrkdir data localbase"

Result Chart Archive: (poudriere-bulk-13_3_prerelease-2024-01-24_09h10m00s.7z,
Attachment #247921)

- zfs-znodes-and-dnodes.png
  - The counts of the ZFS znodes and dnodes.
- zfs-arc-pruning-regulation.png
  - The counts of the ARC prune triggers by ZFS and the skips by the fix.
- zfs-dnodes-and-freeing-activity.png
  - The freeing activity of the ZFS znodes and dnodes.
- vnode-free-calls.png
  - The calls to the ZFS vnode freeing functions.

* Findings and Analysis

- The build time was shorter than 14.0-RELEASE because emulators/mame, started
in 4.5 hours, benefitted from ccache and completed in just 10 minutes.  That
does not work on 14.0-RELEASE and all sources have to be rebuilt.
  - If the emulators/mame build did not use ccache, its build would take ~2.5
hours and the whole poudriere-bulk(8) would complete in ~7 hours.  This is the
same time as 14.0-RELEASE.

- No ARC pruning happened during poudriere-bulk(8).
  - The only one pruning happened while settling down the system before
poudriere-bulk(8).
  - On OpenZFS 2.1, the ARC pruning is not triggered by the excess unevictable
size in the ARC.
    - Above works on OpenZFS 2.2 in 14-STABLE.
    - Only the overcommitted dnodes and metadata size trigger the ARC on
OpenZFS 2.1.
  - vfs.zfs.arc.dnode_limit in my setup effectively disabled the ARC pruning on
OpenZFS 2.1.
    - Maybe this should be reverted to the default and retested.

- The zfskern{arc_evict} thread used the CPU up to 100% in the final ~1 hour of
the build.
  - The reason is not clear.
  - There were no significant affects to the system.

-- 
You are receiving this mail because:
You are the assignee for the bug.