[Bug 261448] ZFS VOP_RECLAIM causes long stalls during package builds

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 24 Jan 2022 22:13:05 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261448

            Bug ID: 261448
           Summary: ZFS VOP_RECLAIM causes long stalls during package
                    builds
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: markj@FreeBSD.org

I noticed that a system running "poudriere bulk -a" often appears to stall for
several seconds.  During a stall, all CPUs are nearly idle even though the
system is ostensibly building NCPU packages in parallel.  Digging a bit, the
problem happens when a package build finishes and poudriere does a "zfs
rollback" of some dataset.

The kernel implementation of zfs rollback calls zfsvfs_teardown(), which uses
the per-dataset "teardown lock" to suspend all VOPs on the dataset while it
walks the list of vnodes associated with the mount point (multiple times) to
invalidate the name cache and cached data.  Any user threads performing
operations on files in the dataset are blocked for this period.

The second part of the problem is getnewvnode(), which may call vn_alloc_hard()
to reclaim a used vnode if the total number of vnodes in the system is above
the "desiredvnodes" threshold.  vn_alloc_hard() performs direct reclamation
from the vnode free list, and as a part of this recycling it must call
VOP_RECLAIM.  ZFS' VOP_RECLAIM implementation, like all VOPs, acquires the
teardown lock, so it can block for a long time.  If we get unlucky and a batch
of vnodes belonging to a suspended dataset are at the head of the free list,
then the system very quickly gets stuck until the rollback completes.

Logging CPU usage during poudriere runs shows lots of idle CPU caused by this
problem.  The amount varies depending on how quickly package builds are
finishing.  Logging output from "vmstat 1" overnight gives:

$ cat vmstat-log | awk '/^[0-9]/{user += $(NF - 2); sys += $(NF - 1); idle +=
$(NF)} END{printf "user %d sys %d idle %d total %d\n", user, sys, idle, user +
sys + idle}'
user 2123240 sys 259657 idle 126830 total 2509727

so something like 5% of CPU is lost.  Data collected over a shorter time period
during which there were lots of small package builds shows it can be much
worse:

user 112162 sys 25396 idle 47896 total 185454

-- 
You are receiving this mail because:
You are the assignee for the bug.