Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Wed, 14 Feb 2024 17:30:46 UTC
On 2/14/24 8:42 AM, Warner Losh wrote:
> On Wed, Feb 14, 2024 at 9:08 AM John Baldwin <jhb@freebsd.org> wrote:
> 
>> On 2/12/24 5:57 PM, Mark Millard wrote:
>>> On Feb 12, 2024, at 16:36, Mark Millard <marklmi@yahoo.com> wrote:
>>>
>>>> On Feb 12, 2024, at 16:10, Mark Millard <marklmi@yahoo.com> wrote:
>>>>
>>>>> On Feb 12, 2024, at 12:00, Mark Millard <marklmi@yahoo.com> wrote:
>>>>>
>>>>>> [Gack: I was looking at the wrong vintage of source code, predating
>>>>>> your changes: wrong system used.]
>>>>>>
>>>>>>
>>>>>> On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> wrote:
>>>>>>
>>>>>>> On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> wrote:
>>>>>>>
>>>>>>>> On 2/9/24 8:13 PM, Mark Millard wrote:
>>>>>>>>> Summary:
>>>>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem
>> 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>>>>>>>> pcib0: parsing FDT for ECAM0:
>>>>>>>>> pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size:
>> 0x40000000
>>>>>>>>> . . .
>>>>>>>>> rman_manage_region: <pcib1 memory window> request: start
>> 0x600000000, end 0x6000fffff
>>>>>>>>> panic: Failed to add resource to rman
>>>>>>>>
>>>>>>>> Hmmm, I suspect this is due to the way that bus_translate_resource
>> works which is
>>>>>>>> fundamentally broken.  It rewrites the start address of a resource
>> in-situ instead
>>>>>>>> of keeping downstream resources separate from the upstream
>> resources.   For example,
>>>>>>>> I don't see how you could ever release a resource in this design
>> without completely
>>>>>>>> screwing up your rman.  That is, I expect trying to detach a PCI
>> device behind a
>>>>>>>> translating bridge that uses the current approach should corrupt
>> the allocated
>>>>>>>> resource ranges in an rman long before my changes.
>>>>>>>>
>>>>>>>> That said, that doesn't really explain the panic.  Hmm, the panic
>> might be because
>>>>>>>> for PCI bridge windows the driver now passes RF_ACTIVE and the
>> bus_translate_resource
>>>>>>>> hack only kicks in the activate_resource method of
>> pci_host_generic.c.
>>>>>>>>
>>>>>>>>> Detail:
>>>>>>>>> . . .
>>>>>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem
>> 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>>>>>>>> pcib0: parsing FDT for ECAM0:
>>>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size:
>> 0x40000000
>>>>>>>>
>>>>>>>> This indicates this is a translating bus.
>>>>>>>>
>>>>>>>>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
>>>>>>>>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1
>>>>>>>>> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff,
>> count=0x100000
>>>>>>>>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000,
>> 0xc00fffff], length 0x100000, flags 102, device pcib1
>>>>>>>>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff>
>>>>>>>>> considering [0xc0000000, 0xffffffff]
>>>>>>>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000
>> (requested 0x100000)
>>>>>>>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
>>>>>>>>> allocating from the beginning
>>>>>>>>> rman_manage_region: <pcib1 memory window> request: start
>> 0x600000000, end 0x6000fffff
>>>>>
>>>>> What you later typed does not match:
>>>>>
>>>>> 0x600000000
>>>>> 0x6000fffff
>>>>>
>>>>> You later typed:
>>>>>
>>>>> 0x60000000
>>>>> 0x600fffffff
>>>>>
>>>>> This seems to have lead to some confusion from using the
>>>>> wrong figure(s).
>>>>>
>>>>>>>> The fact that we are trying to reserve the CPU addresses in the
>> rman is because
>>>>>>>> bus_translate_resource rewrote the start address in the resource
>> after it was allocated.
>>>>>>>>
>>>>>>>> That said, I can't see why rman_manage_region would actually fail.
>> At this point the
>>>>>>>> rman is empty (this is the first call to rman_manage_region for
>> "pcib1 memory window"),
>>>>>>>> so only the check that should be failing are the checks against
>> rm_start and
>>>>>>>> rm_end.  For the memory window, rm_start is always 0, and rm_end is
>> always
>>>>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new
>> (0x60000000 - 0x600fffffff)
>>>>>>>> ranges are within those bounds.
>>>>>
>>>>> No:
>>>>>
>>>>> 0xffffffff
>>>>>
>>>>> .vs (actual):
>>>>>
>>>>> 0x600000000
>>>>> 0x6000fffff
>>
>> Ok, then this explains the failure if the "raw" addresses are above 4G.  I
>> have
>> access to an emag I'm currently using to test fixes to pci_host_generic.c
>> to
>> avoid corrupting struct resource objects.  I'll post the diff once I've got
>> something verified to work.
>>
>>> It looks to me like in sys/dev/pci/pci_pci.c the:
>>>
>>> static void
>>> pcib_probe_windows(struct pcib_softc *sc)
>>> {
>>> . . .
>>>           pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0xffffffff);
>>> . . .
>>>
>>> is just inappropriately restrictive about where in the system
>>> address space a PCIe can validly be mapped to on the high end.
>>> That, in turn, leads to the rejection on the RPi4B now that
>>> the range use is checked.
>>
>> No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
>> prefetchable BAR supports 64-bit addresses.  This is why the host bridge
>> is doing a translation from the CPU side (0x600000000) to the PCI BAR
>> addresses (0xc0000000) so that the BAR addresses are down in the 32-bit
>> address range.  It's also true that many PCI devices only support 32-bit
>> addresses in memory BARs.  64-bit BARs are an optional extension not
>> universally supported.
>>
>> The translation here is somewhat akin to a type of MMU where the CPU
>> addresses are mapped to PCI addresses.  The problem here is that the
>> PCI BAR resources need to "stay" as PCI addresses since we depend on
>> being able to use rman_get_start/end to get the PCI addresses of
>> allocated resources, but pci_host_generic.c currently rewrites the
>> addresses.
>>
>> Probably I should remove rman_set_start/end entirely (Warner added them
>> back in 2004) as the methods don't do anything to deal with the fallout
>> that the rman.rm_list linked-list is no longer sorted by address once
>> some addresses get rewritten, etc.
>>
> 
> At the time, they made sense. Removing it, though may take some doing
> since we use it in about 284 places  in sys/dev today... Somewhat more
> pervasive than I'd have thought they'd be...

Eh, I only find the one that I'm now removing:

% git grep -E 'rman_set_(start|end)' sys/
sys/dev/pci/pci_host_generic.c: rman_set_start(r, start);
sys/dev/pci/pci_host_generic.c: rman_set_end(r, end);
sys/kern/subr_rman.c:rman_set_start(struct resource *r, rman_res_t start)
sys/kern/subr_rman.c:rman_set_end(struct resource *r, rman_res_t end)
sys/sys/rman.h:void     rman_set_end(struct resource *_r, rman_res_t _end);
sys/sys/rman.h:void     rman_set_start(struct resource *_r, rman_res_t _start);

Also, I managed to boot the emag I have access to this morning.  I had to
fix a few other bugs in acpi(4) for my changes in pci_host_generic to work
but will post reviews later today.

-- 
John Baldwin