svn commit: r326218 - head/sys/kern

Andrew Turner andrew at fubar.geek.nz
Mon Nov 27 18:06:49 UTC 2017


> On 26 Nov 2017, at 17:07, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:
> 
> 
> 
> On 11/26/17 01:46, Andrew Turner wrote:
>> On 25 Nov 2017, at 23:41, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:
>>> Author: nwhitehorn
>>> Date: Sat Nov 25 23:41:05 2017
>>> New Revision: 326218
>>> URL: https://svnweb.freebsd.org/changeset/base/326218
>>> 
>>> Log:
>>>  Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs
>>>  are numbered densely from there to n_cpus.
>>> 
>>>  MFC after:	1 month
>>> 
>>> Modified:
>>>  head/sys/kern/kern_clock.c
>>>  head/sys/kern/kern_clocksource.c
>>>  head/sys/kern/kern_shutdown.c
>>>  head/sys/kern/kern_timeout.c
>>>  head/sys/kern/sched_ule.c
>>>  head/sys/kern/subr_pcpu.c
>>> 
>> ...
>>> Modified: head/sys/kern/subr_pcpu.c
>>> ==============================================================================
>>> --- head/sys/kern/subr_pcpu.c	Sat Nov 25 23:23:24 2017	(r326217)
>>> +++ head/sys/kern/subr_pcpu.c	Sat Nov 25 23:41:05 2017	(r326218)
>>> @@ -279,6 +279,8 @@ pcpu_destroy(struct pcpu *pcpu)
>>> struct pcpu *
>>> pcpu_find(u_int cpuid)
>>> {
>>> +	KASSERT(cpuid_to_pcpu[cpuid] != NULL,
>>> +	    ("Getting uninitialized PCPU %d", cpuid));
>>> 
>>> 	return (cpuid_to_pcpu[cpuid]);
>>> }
>> This breaks on one arm64 simulator I have where the device tree lists 8 cpus, but only 2 are enabled in the simulation. The ofw_cpu device nodes for these are cpu0 and cpu4 so the call to cpu_find in ofw_cpu_attach will hit this KASSERT when the CPU has not been enabled.
>> 
>> Andrew
>> 
>> cpu0: <Open Firmware CPU> on cpulist0
>> cpu1: <Open Firmware CPU> on cpulist0
>> panic: Getting uninitialized PCPU 1
>> cpuid = 0
>> time = 1
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self_wrapper+0x28
>> 	 pc = 0xffff0000005f03e8  lr = 0xffff000000087048
>> 	 sp = 0xffff0000000105a0  fp = 0xffff0000000107b0
> 
> That's unfortunate. Removing the new KASSERT is probably not the right option, since all it is doing is indicating a pre-existing bug. Specifically, it is preventing ofw_cpu from using a NULL pointer for CPU 4, which I suppose it was previously happily doing (it would only get dereferenced if you had cpufreq support, hence no previous panic).

I would expect cpu4 to have a non-NULL value as its cpu pointer will have been set. It’s cpu1 that is the issue as it’s in the device tree, but doesn’t exist in “hardware”. Because of this it fails when we try to enable it so free it’s pcpu data clear the pointer.

> 
> A super quick-and-dirty fix would be to be fail attach on CPU_ABSENT(device_get_unit(dev)), which restores the previous buggy behavior but without the panic. If you do not actually use ofw_cpu for anything, we could also just disable it temporarily on your platform (it's off for some PPC systems, you will note, for similar reasons).

We used to use it to find CPUs to start, however this is no longer the case.

> 
> A real fix is somewhat complex, since the driver relies on being able to get a mapping from platform-specific numbers in the device tree to an entry in pcpu, which intrinsically relies on reverse-engineering the platform's mapping between some kind of hardware CPU ID and the kernel CPU numbering. I can't think of a way to do that internally. We could make some kind of platform macro, but the whole concept is even somewhat dubious at the moment since a number of IBM systems with SMT don't even have a 1:1 relationship between CPUs as FreeBSD defines them and device tree nodes, so it's possible we need a serious rearchitecture of the driver.

On arm64 we use the ID from the device tree to assign the cpuid so there is a 1:1 mapping between the two. In most cases both are identical, the only case that’s not correct is when we boot from a non-zero CPU, and I know of only one SoC where this is true.

Andrew



More information about the svn-src-all mailing list