Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic

From: Michael Butler <imb_at_protected-networks.net>
Date: Sat, 10 Feb 2024 22:09:20 UTC
I have stability problems with anything at or after this commit 
(b377ff8) on an amd64 laptop. While I see the following panic logged, no 
crash dump is preserved :-( It happens after ~5-6 minutes running in KDE 
(X).

Reverting to 36efc64 seems to work reliably (after ACPI changes but 
before the problematic PCI one)

kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid = 2; apic id = 02
kernel: fault virtual address     = 0x48
kernel: fault code                = supervisor read data, page not present
kernel: instruction pointer       = 0x20:0xffffffff80acb962
kernel: stack pointer             = 0x28:0xfffffe00c4318d80
kernel: frame pointer             = 0x28:0xfffffe00c4318d80
kernel: code segment              = base 0x0, limit 0xfffff, type 0x1b
kernel:                   = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags  = interrupt enabled, resume, IOPL = 0
kernel: current process           = 2 (clock (0))
kernel: rdi: fffff802e460c000 rsi: 0000000000000000 rdx: 0000000000000002
kernel: rcx: 0000000000000000  r8: 000000000000001e  r9: fffffe00c4319000
kernel: rax: 0000000000000002 rbx: fffff802e460c000 rbp: fffffe00c4318d80
kernel: r10: 0000000000001388 r11: 000000007ffc765d r12: 000f000000000000
kernel: r13: 0002000000000000 r14: fffff8000193e740 r15: 0000000000000000
kernel: trap number               = 12
kernel: panic: page fault
kernel: cpuid = 2
kernel: time = 1707573802
kernel: Uptime: 6m19s
kernel: Dumping 942 out of 16242 
MB:..2%..11%..21%..31%..41%..51%..62%..72%..82%..92%
kernel: Dump complete
kernel: Automatic reboot in 15 seconds - press a key on the console to abort

On 2/10/24 13:04, Mark Millard wrote:
> 
> 
> On Feb 9, 2024, at 23:44, Bakul Shah <bakul@iitbombay.org> wrote:
> 
>> $ git bisect good
>> b377ff8110e3489eb6e6b920b51a2384dfc4eb0b is the first bad commit
>>
>>> On Feb 9, 2024, at 8:13 PM, Mark Millard <marklmi@yahoo.com> wrote:
>>>
>>> Summary:
>>>
>>> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>> pcib0: parsing FDT for ECAM0:
>>> pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
>>> . .
>>> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 0x6000fffff
>>> panic: Failed to add resource to rman
>>>
>>>
>>> Detail:
>>>
>>>
>>> . .
>>> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>> pcib0: parsing FDT for ECAM0:
>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
>>> pcib0: Bus is not cache-coherent
>>> rman_reserve_resource_bound: <I/O memory addresses> request: [0xfd500000, 0xfd50930f], length 0x9310, flags 100, device pcib0
>>> rman_reserve_resource_bound: trying 0x1fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x1fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x31bfffff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x33296fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x39bf1fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x39c02fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x39c06fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x39c07fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x39c08fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x39c2afff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x39c36fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x39c37fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x3b03ffff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x3b04ffff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x3b2fffff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x3ee61fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x3ee63fff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0x3fffffff <0xfd500000,0x930f>
>>> rman_reserve_resource_bound: tried 0xfbffffff <0xfd500000,0x930f>
>>> considering [0xfc000000, 0xfd5d1fff]
>>> truncated region: [0xfd500000, 0xfd50930f]; size 0x9310 (requested 0x9310)
>>> candidate region: [0xfd500000, 0xfd50930f], size 0x9310
>>> splitting region in three parts: [0xfc000000, 0xfd4fffff]; [0xfd500000, 0xfd50930f]; [0xfd509310, 0xfd5d1fff]
>>> rman_manage_region: <PCIe Memory> request: start 0xc0000000, end 0xffffffff
>>> pcib0: hardware identifies as revision 0x304.
>>> pcib0: note: reported link speed is 5.0 GT/s.
>>> rman_reserve_resource_bound: <Interrupts> request: [0x51, 0x51], length 0x1, flags 0, device pcib0
>>> rman_reserve_resource_bound: trying 0 <0x51,0>
>>> rman_reserve_resource_bound: tried 0 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x1 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x2 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x3 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x4 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x5 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x6 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x7 <0x51,0>
>>> rman_reserve_resource_bound: tried 0xc <0x51,0>
>>> rman_reserve_resource_bound: tried 0xd <0x51,0>
>>> rman_reserve_resource_bound: tried 0xe <0x51,0>
>>> rman_reserve_resource_bound: tried 0xf <0x51,0>
>>> rman_reserve_resource_bound: tried 0x10 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x11 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x12 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x17 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x18 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x1a <0x51,0>
>>> rman_reserve_resource_bound: tried 0x1b <0x51,0>
>>> rman_reserve_resource_bound: tried 0x22 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x23 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x24 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x25 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x26 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x27 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x28 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x29 <0x51,0>
>>> rman_reserve_resource_bound: tried 0x4e <0x51,0>
>>> rman_reserve_resource_bound: tried 0x4f <0x51,0>
>>> considering [0x50, 0xffffffffffffffff]
>>> truncated region: [0x51, 0x51]; size 0x1 (requested 0x1)
>>> candidate region: [0x51, 0x51], size 0x1
>>> splitting region in three parts: [0x50, 0x50]; [0x51, 0x51]; [0x52, 0xffffffffffffffff]
>>> pci0: <OFW PCI bus> on pcib0
>>> rman_manage_region: <PCI domain 0 bus numbers> request: start 0, end 0xff
>>> rman_reserve_resource_bound: <PCI domain 0 bus numbers> request: [0, 0], length 0x1, flags 0, device pci0
>>> rman_reserve_resource_bound: trying 0xff <0,0>
>>> considering [0, 0xff]
>>> truncated region: [0, 0]; size 0x1 (requested 0x1)
>>> candidate region: [0, 0], size 0x1
>>> allocating from the beginning
>>> pci0: domain=0, physical bus=0
>>> found-> vendor=0x14e4, dev=0x2711, revid=0x00
>>> domain=0, bus=0, slot=0, func=0
>>> class=06-04-00, hdrtype=0x01, mfdev=0
>>> cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords)
>>> lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
>>> intpin=a, irq=0
>>> powerspec 3  supports D0 D3  current D0
>>> secbus=1, subbus=1
>>> rman_reserve_resource_bound: <PCI domain 0 bus numbers> request: [0x1, 0x1], length 0x1, flags 0, device (null)
>>> rman_reserve_resource_bound: trying 0 <0x1,0>
>>> rman_reserve_resource_bound: tried 0 <0x1,0>
>>> considering [0x1, 0xff]
>>> truncated region: [0x1, 0x1]; size 0x1 (requested 0x1)
>>> candidate region: [0x1, 0x1], size 0x1
>>> allocating from the beginning
>>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
> 
> Looking at an old log shows that the next normal output was:
> 
> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, count=0x100000
> 
> and somewhat later there was:
> 
> pcib1: allocated memory range (0xc0000000-0xc0000fff) for rid 10 of bcm_xhci0
> 
> I'll give more text at the end of this message. Basically the
> sequence is assocaited with xhci0. The count=0x100000 looks
> to be from the reg for xhci@7e9c0000.
> 
> 	scb {
> 		compatible = "simple-bus";
> 		#address-cells = <0x00000002>;
> 		#size-cells = <0x00000002>;
> 		ranges = <0x00000000 0x7c000000 0x00000000 0xfc000000 0x00000000 0x03800000 0x00000000 0x40000000 0x00000000 0xff800000 0x00000000 0x00800000 0x00000006 0x00000000 0x00000006 0x00000000 0x00000000 0x40000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0xfc000000>;
> 		dma-ranges = <0x00000000 0x00000000 0x00000000 0x00000000 0x00000004 0x00000000>;
> 		phandle = <0x000000d6>;
> 		pcie@7d500000 {
> 			compatible = "brcm,bcm2711-pcie", "brcm,bcm7445-pcie";
> 			reg = <0x00000000 0x7d500000 0x00000000 0x00009310>;
> 			device_type = "pci";
> 			#address-cells = <0x00000003>;
> 			#interrupt-cells = <0x00000001>;
> 			#size-cells = <0x00000002>;
> 			interrupts = <0x00000000 0x00000094 0x00000004 0x00000000 0x00000094 0x00000004>;
> 			interrupt-names = "pcie", "msi";
> 			interrupt-map-mask = <0x00000000 0x00000000 0x00000000 0x00000007>;
> 			interrupt-map = <0x00000000 0x00000000 0x00000000 0x00000001 0x00000001 0x00000000 0x0000008f 0x00000004>;
> 			msi-controller;
> 			msi-parent = <0x0000002c>;
> 			ranges = <0x02000000 0x00000000 0xc0000000 0x00000006 0x00000000 0x00000000 0x40000000>;
> 			dma-ranges = <0x02000000 0x00000004 0x00000000 0x00000000 0x00000000 0x00000002 0x00000000>;
> 			brcm,enable-ssc;
> 			phandle = <0x0000002c>;
> 			pci@1,0 {
> 				#address-cells = <0x00000003>;
> 				#size-cells = <0x00000002>;
> 				ranges;
> 				reg = <0x00000000 0x00000000 0x00000000 0x00000000 0x00000000>;
> 				usb@1,0 {
> 					reg = <0x00010000 0x00000000 0x00000000 0x00000000 0x00000000>;
> 					resets = <0x0000002d 0x00000000>;
> 				};
> 			};
> 		};
>   . .
> 		xhci@7e9c0000 {
> 			compatible = "generic-xhci";
> 			status = "disabled";
> 			reg = <0x00000000 0x7e9c0000 0x00000000 0x00100000>;
> 			interrupts = <0x00000000 0x000000b0 0x00000004>;
> 			power-domains = <0x00000013 0x00000006>;
> 			phandle = <0x000000d9>;
> 		};
> 
> Note the lack of nesting of xhci@7e9c0000 relative to
> pcie@7d50000 .
> 
>>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1
>>> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, count=0x100000
>>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, 0xc00fffff], length 0x100000, flags 102, device pcib1
>>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff>
>>> considering [0xc0000000, 0xffffffff]
>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested 0x100000)
>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
>>> allocating from the beginning
>>> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 0x6000fffff
>>> panic: Failed to add resource to rman
>>> cpuid = 0
>>> time = 1
>>> KDB: stack backtrace:
>>> db_trace_self() at db_trace_self
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x38
>>> vpanic() at vpanic+0x1a4
>>> panic() at panic+0x48
>>> pcib_add_window_resources() at pcib_add_window_resources+0xf4
>>> pcib_alloc_window() at pcib_alloc_window+0xfc
>>> pcib_attach_common() at pcib_attach_common+0xa18
>>> pcib_attach() at pcib_attach+0x14
>>> device_attach() at device_attach+0x3fc
>>> device_probe_and_attach() at device_probe_and_attach+0x80
>>> bus_generic_attach() at bus_generic_attach+0x1c
>>> pci_attach() at pci_attach+0xec
>>> device_attach() at device_attach+0x3fc
>>> device_probe_and_attach() at device_probe_and_attach+0x80
>>> bus_generic_attach() at bus_generic_attach+0x1c
>>> device_attach() at device_attach+0x3fc
>>> device_probe_and_attach() at device_probe_and_attach+0x80
>>> bus_generic_new_pass() at bus_generic_new_pass+0x100
>>> bus_generic_new_pass() at bus_generic_new_pass+0xb0
>>> bus_generic_new_pass() at bus_generic_new_pass+0xb0
>>> bus_generic_new_pass() at bus_generic_new_pass+0xb0
>>> bus_set_pass() at bus_set_pass+0x50
>>> mi_startup() at mi_startup+0x1e0
>>> virtdone() at virtdone+0x68
>>> KDB: enter: panic
>>> [ thread pid 0 tid 100000 ]
>>> Stopped at      kdb_enter+0x4c: str     xzr, [x19, #1280]
>>> db>
>>
>>
> 
> For reference from an old log:
> 
> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, count=0x100000
> pcib1:   domain            0
> pcib1:   secondary bus     1
> pcib1:   subordinate bus   1
> pcib1:   memory decode     0xc0000000-0xc00fffff
> pci1: <OFW PCI bus> on pcib1
> pcib1: allocated bus range (1-1) for rid 0 of pci1
> pci1: domain=0, physical bus=1
> found->	vendor=0x1106, dev=0x3483, revid=0x01
> 	domain=0, bus=1, slot=0, func=0
> 	class=0c-03-30, hdrtype=0x00, mfdev=0
> 	cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords)
> 	lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
> 	intpin=a, irq=0
> 	powerspec 3  supports D0 D3  current D0
> 	MSI supports 4 messages, 64 bit
> 	map[10]: type Memory, range 64, base 0, size 12, memory disabled
> pcib1: slot 0 INTA is routed to irq 92
> bcm_xhci0: <VL805 USB 3.0 controller (on the Raspberry Pi 4b)> irq 92 at device 0.0 on pci1
> bcm_xhci0: note: xhci firmware not found.
> bcm_xhci0: note: installing xhci firmware.
> bcm_xhci0: note: xhci firmware detected; firmware is revision 138a1.
> pcib1: allocated memory range (0xc0000000-0xc0000fff) for rid 10 of bcm_xhci0
> bcm_xhci0: Lazy allocation of 0x1000 bytes rid 0x10 type 3 at 0xc0000000
> bcm_xhci0: 32 bytes context size, 64-bit DMA
> bcm_xhci0: attempting to allocate 1 MSI vectors (4 supported)
> bcm_xhci0: using IRQ 93 for MSI
> bcm_xhci0: MSI enabled
> bcm_xhci0: (New XHCI DeviceId=0x34831106)
> usbus0 on bcm_xhci0
> bcm_xhci0: usbpf: Attached
> 
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> 
>