svn commit: r326218 - head/sys/kern
Nathan Whitehorn
nwhitehorn at freebsd.org
Sun Nov 26 17:07:32 UTC 2017
On 11/26/17 01:46, Andrew Turner wrote:
> On 25 Nov 2017, at 23:41, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:
>> Author: nwhitehorn
>> Date: Sat Nov 25 23:41:05 2017
>> New Revision: 326218
>> URL: https://svnweb.freebsd.org/changeset/base/326218
>>
>> Log:
>> Remove some, but not all, assumptions that the BSP is CPU 0 and that CPUs
>> are numbered densely from there to n_cpus.
>>
>> MFC after: 1 month
>>
>> Modified:
>> head/sys/kern/kern_clock.c
>> head/sys/kern/kern_clocksource.c
>> head/sys/kern/kern_shutdown.c
>> head/sys/kern/kern_timeout.c
>> head/sys/kern/sched_ule.c
>> head/sys/kern/subr_pcpu.c
>>
> ...
>> Modified: head/sys/kern/subr_pcpu.c
>> ==============================================================================
>> --- head/sys/kern/subr_pcpu.c Sat Nov 25 23:23:24 2017 (r326217)
>> +++ head/sys/kern/subr_pcpu.c Sat Nov 25 23:41:05 2017 (r326218)
>> @@ -279,6 +279,8 @@ pcpu_destroy(struct pcpu *pcpu)
>> struct pcpu *
>> pcpu_find(u_int cpuid)
>> {
>> + KASSERT(cpuid_to_pcpu[cpuid] != NULL,
>> + ("Getting uninitialized PCPU %d", cpuid));
>>
>> return (cpuid_to_pcpu[cpuid]);
>> }
> This breaks on one arm64 simulator I have where the device tree lists 8 cpus, but only 2 are enabled in the simulation. The ofw_cpu device nodes for these are cpu0 and cpu4 so the call to cpu_find in ofw_cpu_attach will hit this KASSERT when the CPU has not been enabled.
>
> Andrew
>
> cpu0: <Open Firmware CPU> on cpulist0
> cpu1: <Open Firmware CPU> on cpulist0
> panic: Getting uninitialized PCPU 1
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self() at db_trace_self_wrapper+0x28
> pc = 0xffff0000005f03e8 lr = 0xffff000000087048
> sp = 0xffff0000000105a0 fp = 0xffff0000000107b0
That's unfortunate. Removing the new KASSERT is probably not the right
option, since all it is doing is indicating a pre-existing bug.
Specifically, it is preventing ofw_cpu from using a NULL pointer for CPU
4, which I suppose it was previously happily doing (it would only get
dereferenced if you had cpufreq support, hence no previous panic).
A super quick-and-dirty fix would be to be fail attach on
CPU_ABSENT(device_get_unit(dev)), which restores the previous buggy
behavior but without the panic. If you do not actually use ofw_cpu for
anything, we could also just disable it temporarily on your platform
(it's off for some PPC systems, you will note, for similar reasons).
A real fix is somewhat complex, since the driver relies on being able to
get a mapping from platform-specific numbers in the device tree to an
entry in pcpu, which intrinsically relies on reverse-engineering the
platform's mapping between some kind of hardware CPU ID and the kernel
CPU numbering. I can't think of a way to do that internally. We could
make some kind of platform macro, but the whole concept is even somewhat
dubious at the moment since a number of IBM systems with SMT don't even
have a 1:1 relationship between CPUs as FreeBSD defines them and device
tree nodes, so it's possible we need a serious rearchitecture of the driver.
Please let me know how I can help get this fixed.
-Nathan
More information about the svn-src-all
mailing list