Re: 14-CURRENT Kernel Panic due to USB hub?

From: Andrew Turner <andrew_at_fubar.geek.nz>
Date: Wed, 01 Dec 2021 11:39:17 UTC

> On 30 Nov 2021, at 21:37, Hans Petter Selasky <hps@selasky.org> wrote:
> 
> On 11/30/21 18:21, Andrew Turner wrote:
>>> On 30 Nov 2021, at 14:34, Hans Petter Selasky <hps@selasky.org> wrote:
>>> 
>>> On 11/30/21 15:16, Andrew Turner wrote:
>>>>> On 30 Nov 2021, at 12:35, Hans Petter Selasky <hps@selasky.org> wrote:
>>>>> 
>>>>> On 11/30/21 13:22, Andrew Turner wrote:
>>>>>> I bisected the detached messages back to 601ee53858f6 [1]. If I revert this change I no longer see this on the console.
>>>>>> I am also unable to reproduce the panic with this change reverted. As the panic can be difficult to reproduce I am unsure if reverting this change is enough to fix it, or if it’s just making it less likely to be triggered.
>>>>>> Andrew
>>>>>> [1] https://cgit.freebsd.org/src/commit/?id=601ee53858f6
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Could you verify that you are not running out of kernel stack?
>>>> I can still trigger it after doubling the kernel stack size.
>>>>> 
>>>>> May this be due to some code in the .text segment which is not properly aligned?
>>>> I would expect to have seen the issue on other HW. The issue looks more like it’s memory corruption.
>>>>> 
>>>>> If you compile and load USB as modules, does the panic go away?
>>>> I am unable to trigger it after removing xhci from the kernel, and did get a panic after loading the xhci module.
>>>> The xhci controller is one that originated in Broadcom. Linux has a quirk for it to work around an erratum where attaching a USB 1 device followed by a USB 2 device the linker the latter will come up as USB 1. They reset the phy when anything less than USB 3 on a disconnect event.
>>> 
>>> And there is no BIOS / UEFI code still running on that XHCI controller?
>> I would expect the UEFI code to not be accessing the XHCI controller after exiting the loader.
>> Andrew
> 
> Hi,
> 
> Could you try to kldload xhci instead of building it into the kernel config? Maybe you get a different kind of panic that way.

I have. I’m hitting the KASSERT at [1]. Looking at the memory around td->td_pcb->pcb_fpflags makes me think the memory has been trashed as there are bits set that could never be so in the flags fields, and kernel pointer values that point to user memory.

Andrew

[1] https://cgit.freebsd.org/src/tree/sys/arm64/arm64/trap.c?id=6e9309bd3b04501b69593900a14e01114c7f2404#n627