Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Mon, 12 Feb 2024 17:32:34 UTC
On 2/9/24 8:13 PM, Mark Millard wrote:
> Summary:
> 
> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
> pcib0: parsing FDT for ECAM0:
> pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
> . . .
> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 0x6000fffff
> panic: Failed to add resource to rman

Hmmm, I suspect this is due to the way that bus_translate_resource works which is
fundamentally broken.  It rewrites the start address of a resource in-situ instead
of keeping downstream resources separate from the upstream resources.   For example,
I don't see how you could ever release a resource in this design without completely
screwing up your rman.  That is, I expect trying to detach a PCI device behind a
translating bridge that uses the current approach should corrupt the allocated
resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic might be because
for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource
hack only kicks in the activate_resource method of pci_host_generic.c.

> Detail:
> 
> 
> . . .
> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
> pcib0: parsing FDT for ECAM0:
> pcib0: 	PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000

This indicates this is a translating bus.

> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1
> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, count=0x100000
> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, 0xc00fffff], length 0x100000, flags 102, device pcib1
> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff>
> considering [0xc0000000, 0xffffffff]
> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested 0x100000)
> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
> allocating from the beginning
> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 0x6000fffff

The fact that we are trying to reserve the CPU addresses in the rman is because
bus_translate_resource rewrote the start address in the resource after it was allocated.

That said, I can't see why rman_manage_region would actually fail.  At this point the
rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"),
so only the check that should be failing are the checks against rm_start and
rm_end.  For the memory window, rm_start is always 0, and rm_end is always
0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new (0x60000000 - 0x600fffffff)
ranges are within those bounds.  I would instead expect to see some other issue later
on where we fail to allocate a resource for a child BAR, but I wouldn't expect
rman_manage_region to fail.

Logging the return value from rman_manage_region would be the first step I think
to see which error value it is returning.

Probably I should fix pci_host_generic.c to handle translation properly however.
I can work on a patch for that.

-- 
John Baldwin