[Bug 275594] High CPU usage by arc_prune; analysis and fix

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 20 Jan 2024 07:41:09 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275594

--- Comment #34 from Thomas Mueller <thmu7@freenet.de> ---
(In reply to Seigo Tanimura from comment #33)

On Sat, 20 Jan 2024 01:43:28 +0000, bugzilla-noreply@freebsd.org wrote:

> Do you see any other threads using the CPU as much as kernel{arc_prune}? eg
> 
> - vnlru
> - Any threads that access files somehow while running poudriere-bulk(8) (eg
> cc1)

Yes, I've observed vnlru CPU usage 30-40% for longer streaks occasionally
when arc_prune  was at 90-100%.

With 12-STABLE it was possible to have poudriere running at idle priority
on two of the four CPUs and use the system for everyday work in parallel
(X11 UI, MUA, Firefox, or even Virtualbox). With 13-STABLE, the system
bogs down, video playback drops frames and/or audio, etc.

> If so, what you have seen is the same as mine.  Kernel{arc_prune} and the
> threads above contend for the vnode list lock.  Each of them spins in the
> kernel until it acquires the lock, which can be found by top(1) if you have any
> idle CPUs.  You may have to reduce the builders to let top(1) work.

Exactly.

What's also new in 13-STABLE is that sometimes when the issue occurs the
system runs into memory pressure, and pagedaemon can be observed with
remarkable CPU load and processes with high memory usage get killed
(firefox, virtualbox, for example). That might perhaps be caused by
some changes in poudriere default configuration, so I can't quite
tell whether that would also not have appeared on 12.

What also wasn't observed in 12-STABLE, occasional build errors with 
"bad file descriptor errors" which then cannot be reproduced after
restarting the build. Example:

 [stable13amd64-default-job-02] |   `-- Extracting python39-3.9.18: .........
 pkg-static: Fail to chmod
/wsgiref/__pycache__/__init__.cpython-39.opt-1.pyc:Bad file descriptor
 [stable13amd64-default-job-02] |   `-- Extracting python39-3.9.18... done

 Failed to install the following 1 package(s): /packages/All/meson-1.3.1.pkg
 *** Error code 1

> I was not aware at the time of the last massive poudriere-bulk(8) on
> 13.2-RELEASE, but it is now likely that the same issue occured on it as well.
> 
> The comparision of my poudriere-bulk(8) results, both on the same host except
> for the OS versions:
> 
>                          | 13.2-RELEASE | 14.0-RELEASE
> -------------------------+--------------+-------------
> Build Date               |  13 Apr 2023 |  19 Jan 2024
> ZFS Fix                  |           No |          Yes
> # of Packages            |         1147 |         2128
> # of Successful Packages |         1136 |         2127
> Elapsed Time             |     18:44:33 |     06:54:28
> Packages / Hour          |           61 |          309

Looks familiar.

> > Questions:
> > Would migrating to ZFS on root mitigate the issues?  
> 
> I would say no; that would give even move pressure to ARC.

Thanks.

> > Is 13-STABLE in focus for this PR?  
> 
> Not for now, but it should be.  In addition, FreeBSD-EN-23:18.openzfs should
> include 13-STABLE as well.
> 
> I have one baremetal 13.2-RELEASE host with ZFS, but it does not suffer from
> the issue as of now.  This host serves the volumes to the bhyve(8) VMs mainly,
> so it does not use vnodes heavily.

Thanks for analysing this!

-- 
You are receiving this mail because:
You are the assignee for the bug.