Re: Cores of different performance vs. time spent creating threads: Windows Dev Kit 2023 example

From: Mark Millard <marklmi_at_yahoo.com>
Date: Mon, 15 May 2023 19:14:06 UTC
On May 9, 2023, at 19:19, Mark Millard <marklmi@yahoo.com> wrote:

> First some context that reaches an oddity that seems to
> be involved in the time to create threads . . .
> 
> The Windows Dev Kit 2023 (WDK23 abbrevation here) boot reports:
> 
> CPUs (cores) 0..3: cortex-a78c (the slower cores)
> CPUs (cores) 4..7: cortex-x1c  (the faster cores)
> 
> Building a kernel explicitly via involving -mcpu= use
> gets the following oddity relative to cpu numbering
> when the kernel is used:
> 
> -mcpu=cortex-x1c or -mcpu=cortex-a78c:
>    Benchmarking tracks that number/performance pairing.
> 
> -mcpu=cortex-a72:
>    The slower vs. faster gets swapped number blocks.
> 
> So, for -mcpu=cortex-a72 , 0..3 are the faster cores.
> 
> This sets up for the following . . .
> 
> But I also observe (a relative comparison of contexts
> via some benchmark-like activity):
> 
> -mcpu=cortex-x1c or -mcpu=cortex-a78c based kernel:
>    threads take more time to create
> 
> -mcpu=cortex-a72 based kernel:
>    threads take less time to create
> 
> The difference is not trivial for the activity involved
> for this WDK23 context.
> 
> If there is a bias as to which core(s) are involved in part
> of thread creation generally, it would appear to be important
> that the bias to be to the more performant cores (for what the
> activity involves). The above suggests that such is possibly
> not necessarily the case for FreeBSD as is. BIG/little (and
> analogous?) cause this to become more relevant.
> 
> Does this hypothesis about what type of thing is going on
> fit with how FreeBSD actually works?
> 
> As stands, I'm going to experiment with the WDK23 using
> a cortex-a72 targeted kernel but a cortex-x1c/cortex-a78c
> targeted world for my general operation of the WDK23.
> 
> 
> Note: While the benchmark results allow seeing in plots
> what traces back to thread creation time contributions,
> the benchmark itself does not directly measure that time.
> It is more like, the average work rate for a time changes
> based on the fraction of the time involved in the thread
> creations for each given problem size. The actual definition
> of work here involves a mathematical quantity for a
> mathematical problem (that need not be limited to computers
> doing the work).
> 
> The benchmark results are more useful for discovering that
> there is something to potentially investigate than to
> actually do an investigation with.
> 

Never  mind:

Starting over did not reproduce the oddity. So:
operator oddity/error, though I've no clue of how
to reproduce the odd swap of which cpu number ranges
took more vs. less time for each given size problem.
(Or any other aspect that might be considered also
odd, such as specific performance figures.)

Retry details:

I booted the WDK23 via UFS media set up for
cortex-a72, media that I use for UFS activities on
the HoneyComb (for example). I built the benchmark
and ran it.

As stands, I've only done the "cpu lock down" case.
It produces less messy data by avoiding cpu
migration once the lockdown completes (singleton
cpuset for the thread). I'll also run the variant
that does not have the cpu lock downs (standard
C++ code without FreeBSD specifics added).

===
Mark Millard
marklmi at yahoo.com