[Bug 275594] High CPU usage by arc_prune; analysis and fix
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 23 Feb 2024 19:25:52 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275594
--- Comment #67 from Peter Much <pmc@citylink.dinoex.sub.org> ---
So, now I read all the material here. Great work!
I had upgraded my deploy engine from 13.2-RELEASE to 13.3-BETA, and found
(among some spurious messages from git) that it can no longer build gcc12.
There is apparently no problem with rust or llvm15, but trying to build gcc12
does reproducibly crash (10 core, 16081M ram). Apparently the crash happens
when gcc fully powers up its LTO for the first time:
last pid: 37369; load averages: 9.35, 9.93, 9.27 up 0+03:15:25 07:21:42
417 threads: 14 running, 379 sleeping, 24 waiting
CPU: 55.4% user, 0.0% nice, 35.6% system, 0.1% interrupt, 8.8% idle
Mem: 7047M Active, 6121M Inact, 2392M Wired, 984M Buf, 60M Free
ARC: 518M Total, 45M MFU, 451M MRU, 128K Anon, 3990K Header, 17M Other
467M Compressed, 997M Uncompressed, 2.14:1 Ratio
Swap: 15G Total, 15G Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -8 - 0B 2432K CPU4 4 3:14 99.79% kernel{arc_p
7 root -16 - 0B 48K CPU6 6 2:45 99.79% pagedaemon{d
15 root 52 - 0B 16K CPU0 0 3:00 99.70% vnlru
37334 root 52 0 891M 789M pfault 1 0:37 89.24% lto1
37270 root 52 0 1017M 915M pfault 3 0:43 88.63% lto1
37324 root 52 0 831M 770M pfault 8 0:39 88.59% lto1
37338 root 52 0 843M 785M pfault 2 0:36 88.50% lto1
37333 root 52 0 889M 788M pfault 7 0:37 82.76% lto1
37269 root 52 0 1001M 882M pfault 5 0:42 82.09% lto1
37274 root 52 0 1004M 885M pfault 9 0:42 80.24% lto1
5 root 20 - 0B 1568K t->zth 9 0:02 1.02% zfskern{arc_
37360 root 20 0 14M 4940K CPU9 9 0:00 0.87% top
This is the last output, at this point the system becomes unresponsive, and,
when allowed neither to oom-kill nor panic, continues to consume 300% compute.
Apparently these are the visible three apocalyptic riders (arc_prune,
pagedaemon, vnlru) entertaining themselves. :/
Implementing the patch (i.e. five new git commits from the github repo) solves
the issue, and afterwards it looks like this:
last pid: 11944; load averages: 7.13, 5.29, 5.77 up 0+03:48:45 16:12:46
424 threads: 19 running, 381 sleeping, 24 waiting
CPU: 67.9% user, 0.0% nice, 5.1% system, 0.0% interrupt, 27.0% idle
Mem: 9308M Active, 2285M Inact, 20M Laundry, 3643M Wired, 865M Buf, 336M Free
eRC: 1638M Total, 855M MFU, 575M MRU, 128K Anon, 11M Header, 198M Other
1305M Compressed, 2980M Uncompressed, 2.28:1 Ratio
Swap: 15G Total, 15G Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11579 root 103 0 1269M 1066M CPU6 6 4:09 100.00% lto1
11605 root 103 0 1263M 1052M CPU3 3 4:08 99.87% lto1
11589 root 103 0 1295M 1091M CPU8 8 4:09 99.87% lto1
11599 root 103 0 1259M 1027M CPU9 9 4:08 99.87% lto1
11588 root 103 0 1263M 1035M CPU7 7 4:09 99.87% lto1
11590 root 103 0 1287M 1058M CPU5 5 4:08 99.87% lto1
11598 root 103 0 1311M 1082M CPU1 1 4:08 99.74% lto1
0 root -8 - 0B 2448K - 6 0:03 6.83% kernel{arc_p
5 root -8 - 0B 1568K RUN 9 0:03 5.80% zfskern{arc_
7 root -16 - 0B 48K psleep 2 0:37 3.11% pagedaemon{d
I'm a bit worried the thing is still reluctant to page out, but otherwise this
looks good.
--
You are receiving this mail because:
You are the assignee for the bug.