Re: HoneyComb first-boot notes [a L3/L2/L1/RAM performance oddity: fix identified]

From: Mark Millard via freebsd-arm <freebsd-arm_at_freebsd.org>
Date: Thu, 15 Jul 2021 20:48:11 UTC
On 2021-Jul-11, at 18:29, Mark Millard <marklmi at yahoo.com> wrote:

>>>> . . .
>>> 
>>> I've run into an issue where what FreeBSD calls cpu 0 has
>>> significantly different L3/L2/L1/RAM subsystem performance
>>> than all the other cores (cpu 0 being worse). Similarly for
>>> compared/contrasted to all 4 MACCHIATObin Double Shot cores.
>>> 
>>> A plot with curves showing the issue is at:
>>> 
>>> https://github.com/markmi/acpphint/blob/master/acpphint_example_data/HoneyCombFreeBSDcpu0RAMAccessPerformanceIsOdd.png
>>> 
>>> The dark red curves in the plot show the expected general
>>> shape for such and are for cpu 0. The lighter colored
>>> curves are the MACCHIATObin curves. The darker ones are
>>> the HoneyComb curves, where the L3/L2/L1 is relatively
>>> effective (other than cpu 0).
>>> 
>>> My notes on Discord (so far) are . . .
>>> 
>>> The curves are from my C++ variant of the old Hierarchical
>>> INTegration benchmark (historically abbreviated HINT). You
>>> can read the approximate size of a level of cache  from 
>>> the x-axis for where the curve drops faster. So, right
>>> (most obvious) to left (least obvious): L3 8 MiByte, L2 1
>>> MiByte (per core pair, as it turns out), L1 32 KiByte.
>>> 
>>> The curves here are for single thread  benchmark
>>> configurations with cpuset used to control which CPU is
>>> used. I first noticed this via odd performance variations
>>> in multithreading with more cores allowed than in use (so
>>> migrations to a variety of cpus over time).
>>> 
>>> I explored all the CPUs (cores), not just what I plotted.
>>> Only the one gets the odd performing memory access
>>> structure in its curve.
>>> 
>>> FYI: The FreeBSD boot is UEFI/ACPI based for both systems,
>>> not U-Boot based.
>>> 
>> 
>> Jon Nettleton has replicated the memory access performance
>> issue on the one cpu via a different HoneyComb, running
>> some Linux kernel, using tinymembench as the benchmark.
>> 
> 
> Jon reports that for HoneyCombs older and newer, EDK2's older
> and newer: All show the behavior on cpu 0. "[I]t may have
> always existed."
> 
> Jon also reports that U-Boot based booting does not get the
> behavior.
> 
> (I've never used U-Boot to boot the HoneyComb for any OS
> media that I've got around. In my U-Boot ignorance, my
> quick attempts failed for FreeBSD main and Fedora 34
> Server media that I've been using with EDK2's UEFI/ACPI.)

The problem in the:

lx2160a_uefi/build/arm-trusted-firmware/plat/nxp/soc-lx2160a/soc.c

code has been identified and my testing of the proposed fix
indicates things are working.

Some very early code setting up the L1 Data prefetch
configuration was depending on not-well-initialized
memory and an initialization routine needed to be used
a little earlier in the sequencing to avoid that.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)