Re: cpufreq & hwpstate_amd & Zen 2

From: Johannes Totz <jo_at_bruelltuete.com>
Date: Fri, 19 May 2023 00:29:29 UTC
On 15/05/2023 22:16, Johannes Totz wrote:
> Hi all,
> 
> I'm poking cpufreq's hwpstate_amd to see what I can tune re performance 
> vs power vs heat trade-off.

Here are some patches, if anyone is interested:

https://reviews.freebsd.org/D40139
Adds a tunable for cpufreq/hwpstate to get the P-state info from the 
CPU's MSR instead of acpi_perf.

https://reviews.freebsd.org/D40158
Adds another tunable that allows overriding the default (or 
BIOS-configured?) P-state configuration. Stuff like over- or 
underclocking and -volting.

https://reviews.freebsd.org/D40140
Adds power calculation if P-state info comes from MSR. This was missing 
until now but is really just cosmetic.

These do not solve the mystery below though :(
And fwiw, C-state power saving is really effective. Messing with the 
P-states does not do much while idle, it's measurable only when the CPU 
is busy.

> I'm struggling with the P-state behaviour though.
> The code looks really straight-forward: 
> https://github.com/freebsd/freebsd-src/blob/main/sys/x86/cpufreq/hwpstate_amd.c#L172
> 
> But enabling hwpstate_verify, it looks like P-state transitions never go 
> as requested.
> For this, I'm not running powerd.
> In addition to the existing verify code, I've sprinkled in a few more 
> printfs.
> 
> PStateCurLim (aka MSR_AMD_10H_11H_LIMIT = 0x20) and PStateDef (aka 
> MSR_AMD_10H_11H_CONFIG = eg 0x8000000049120890) look all reasonable.
> 
> 
> $ sysctl dev.cpu.0
> dev.cpu.0.freq_levels: 3600/3960 2800/2800 2200/1980
> dev.cpu.0.freq: 2800
> 
> $ sysctl dev.cpu.0.freq=3600
> dev.cpu.0.freq: 2800 -> 3600
> 
> $ cat /var/log/messages
> [...extra printf debugging...]
> kernel: hwpstate0: setting P0-state on cpu0
> kernel: hwpstate0: setting P1(2) -> P0 on cpu1
> [...same for all the other cpus...]
> kernel: hwpstate0: setting P1(2) -> P0 on cpu15
> 
> 
> This shows that cpufreq thought we were at P1 and wanted to transition 
> to P0. But actually, the CPU was in P2 (the 2 in brackets).
> 
> We want to go from P0 to P2...
> 
> 
> $ sysctl dev.cpu.0.freq=2200
> dev.cpu.0.freq: 3600 -> 2200
> 
> $ cat /var/log/messages
> kernel: hwpstate0: setting P2-state on cpu0
> kernel: hwpstate0: setting P0(1) -> P2 on cpu1
> 
> 
> ...but CPU was in P1 at that time.
> 
> Wanting to go from P2 back to P1...
> 
> 
> $ sysctl dev.cpu.0.freq=2800
> dev.cpu.0.freq: 2200 -> 2800
> 
> $ cat /var/log/messages
> kernel: hwpstate0: setting P1-state on cpu0
> kernel: hwpstate0: setting P2(2) -> P1 on cpu1
> 
> 
> ...shows that this time the CPU really was in P2 (yeay). But it did not 
> transition to P1, it stayed in P2 (not shown in the log).
> 
> 
> So question is: what else could be interfering with P-state?
> 
> 
> thanks,
> 
> Johannes