ThunderX Panic after r368370
Mark Millard
marklmi at yahoo.com
Mon Dec 7 08:21:39 UTC 2020
On 2020-Dec-6, at 13:30, Mark Millard <marklmi at yahoo.com> wrote:
> On 2020-Dec-6, at 03:51, Michal Meloun <meloun.michal at gmail.com> wrote:
>
> On 06.12.2020 10:47, Mark Millard wrote:
>>> On 2020-Dec-6, at 00:17, Michal Meloun <meloun.michal at gmail.com> wrote:
>>>> On 06.12.2020 3:21, Marcel Flores wrote:
>>>>> Hi All,
>>>>> Looks like the ThunderX started panicking at boot after r368370:
>>>>> https://reviews.freebsd.org/rS368370
>>>>> From a verbose boot, it looks like it bails in gic0 redistributor setup(?):
>>>>> gic0: CPU29 Re-Distributor woke up
>>>>> gic0: CPU24 enabled CPU interface via system registers
>>>>> gic0: CPU17 enabled CPU interface via system registers
>>>>> gic0: CPU29 enabled CPU interface via system registers
>>>>> done
>>>>> Full Verbose boot:
>>>>> https://gist.github.com/mesflores/f026122495c8494d041bce04d30b15bb
>>>>> I'm not really familiar with the details of the commit, but happy to test
>>>>> anything if anyone has any ideas.
>>>>
>>>>
>>>> Hi Marcel
>>>> are you able to get crashdump and do backtrace?
>>>> https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html#kerneldebug-obtain
>>>> and
>>>> https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
>>>> If not, I'll make some debug patch.
>>>>
>>>> It's weird, even though GIC is potentially affected by my patch, in this case the cpuid numbering was not changed.
>>> (I've no access to a ThunderX. I just looked for my own curiosity.
>>> Sorry if this is obvious and so is noise.)
>>> When I looked at the code it appeared to be the last "->" in
>>> the following that was dereferencing the nullptr value (via [x8]
>>> in assembler notation):
>>> static uint64_t
>>> its_cmd_prepare(struct its_cmd *cmd, struct its_cmd_desc *desc)
>>> {
>>> uint64_t target;
>>> uint8_t cmd_type;
>>> u_int size;
>>> cmd_type = desc->cmd_type;
>>> target = ITS_TARGET_NONE;
>>> switch (cmd_type) {
>>> case ITS_CMD_MOVI: /* Move interrupt ID to another collection */
>>> target = desc->cmd_desc_movi.col->col_target;
>>> . . .
>>> In other words: it appeared to me that the above desc->cmd_desc_movi.col
>>> evaluated as 0 when used in what was reported.
>> This is very probably right analysis. But problem is that cmd_desc_movi.col should not be NULL, is initialized in its_cmd_movi from sc->sc_its_cols which should be allocated in gicv3_its_attach().
>>
>
> The following is unlikely to directly contribute to the
> specific problem's solution but documents an oddity that
> took my time while looking around related the problem.
>
. . .
I'm omitting the material about the "start" part of the comment
below. I've more directly useful for the problem later below.
> /*
> * Note that `start` and the returned value from BIT_FFS_AT are
> * 1-based bit indices.
> */
> #define BIT_FFS_AT(_s, p, start) __extension__ ({ \
> . . .
>
. . .
Looks to me like fdt_cpuid's use in cpu_init_fdt is one of the issues
with what is added to each cpuset_domain[domain] :
/* Skip boot CPU */
if (__pcpu[0].pc_mpidr == (target_cpu & CPU_AFF_MASK))
return (TRUE);
. . .
fdt_cpuid++;
/* Try to read the numa node of this cpu */
if (vm_ndomains == 1 ||
OF_getencprop(node, "numa-node-id", &domain, sizeof(domain)) <= 0)
domain = 0;
__pcpu[fdt_cpuid].pc_domain = domain;
if (domain < MAXMEMDOM)
CPU_SET(fdt_cpuid, &cpuset_domain[domain]);
fdt_cpuid's initial value can not be added by this code: it is
incremented first.
cpu_mp_start initializes fdt_cpuid via:
fdt_cpuid = 1;
ofw_cpu_early_foreach(cpu_init_fdt, true);
So fdt_cpuid==2 is the smallest value that can be added to
&cpuset_domain[domain] via that ofw_cpu_early_foreach call
that in turn calls cpu_init_fdt.
More then that, there is also the "Skip boot CPU" code that
avoids ever adding the boot CPU to a &cpuset_domain[domain] .
This matches up well with the logs showing the two "NULL"
lines in:
gicv3_its_attach: per domain cpus
gicv3_its_attach: NULL its col[0]
gicv3_its_attach: NULL its col[1]
gicv3_its_attach: new its col[2]
gicv3_its_attach: new its col[3]
. . .
gicv3_its_attach: new its col[29]
gicv3_its_attach: new its col[30]
gicv3_its_attach: new its col[31]
and the log's content just before the panic:
gicv3_its_bind_intr: Enter
gicv3_its_select_cpu: cpuset not empty
its_cmd_movi: isrc_cpu 0, col; 0
panic: data abort with spinlock held
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-arm
mailing list