Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Mon, 12 Feb 2024 17:36:46 UTC
On 2/10/24 2:09 PM, Michael Butler wrote:
> I have stability problems with anything at or after this commit
> (b377ff8) on an amd64 laptop. While I see the following panic logged, no
> crash dump is preserved :-( It happens after ~5-6 minutes running in KDE
> (X).
> 
> Reverting to 36efc64 seems to work reliably (after ACPI changes but
> before the problematic PCI one)
> 
> kernel: Fatal trap 12: page fault while in kernel mode
> kernel: cpuid = 2; apic id = 02
> kernel: fault virtual address     = 0x48
> kernel: fault code                = supervisor read data, page not present
> kernel: instruction pointer       = 0x20:0xffffffff80acb962
> kernel: stack pointer             = 0x28:0xfffffe00c4318d80
> kernel: frame pointer             = 0x28:0xfffffe00c4318d80
> kernel: code segment              = base 0x0, limit 0xfffff, type 0x1b
> kernel:                   = DPL 0, pres 1, long 1, def32 0, gran 1
> kernel: processor eflags  = interrupt enabled, resume, IOPL = 0
> kernel: current process           = 2 (clock (0))
> kernel: rdi: fffff802e460c000 rsi: 0000000000000000 rdx: 0000000000000002
> kernel: rcx: 0000000000000000  r8: 000000000000001e  r9: fffffe00c4319000
> kernel: rax: 0000000000000002 rbx: fffff802e460c000 rbp: fffffe00c4318d80
> kernel: r10: 0000000000001388 r11: 000000007ffc765d r12: 000f000000000000
> kernel: r13: 0002000000000000 r14: fffff8000193e740 r15: 0000000000000000
> kernel: trap number               = 12
> kernel: panic: page fault
> kernel: cpuid = 2
> kernel: time = 1707573802
> kernel: Uptime: 6m19s
> kernel: Dumping 942 out of 16242
> MB:..2%..11%..21%..31%..41%..51%..62%..72%..82%..92%
> kernel: Dump complete
> kernel: Automatic reboot in 15 seconds - press a key on the console to abort

Without a stack trace it is pretty much impossible to debug a panic like this.
Do you have KDB_TRACE enabled in your kernel config?  I'm also not sure how the
PCI changes can result in a panic post-boot.  If you were going to have problems
they would be during device attach, not after you are booted and running X.

Short of a stack trace, you can at least use lldb or gdb to lookup the source
line associated with the faulting instruction pointer (as long as it isn't in
a kernel module), e.g. for gdb you would use 'gdb /boot/kernel/kernel' and then
'l *<instruction pointer address>', e.g. from above: 'l *0xffffffff80acb962'

-- 
John Baldwin