INTRNG (Was: svn commit: r301453....)

Michal Meloun mmel at FreeBSD.org
Sat Jul 30 16:18:40 UTC 2016


Dne 29.07.2016 v 16:41 Nathan Whitehorn napsal(a):
[snip]
Svata is online now, so i think that he is more competent for responding
to all questions about INTRNG internals.


>>>> But, as you wrote,  INTRNG code is "absolutely free to map and
>>>> dispatch
>>>> multiple virtual IRQs from the same shared interrupt pin" (allow me to
>>>> expand this also to INTRNG code itself) .
>>>> In attempt to simply  situation abound cache, in first step, we
>>>> removed
>>>> duplicates check from above model.  This also removes unpleasantly
>>>> sharing of entries.
>>>> This step converts cache to some sort of duplicate of resource list
>>>> entries. Each existing RLE have exactly one corresponding entry in
>>>> cache.
>>>> Also,  the cache is not needed for INTRNG core. At this point, its
>>>> only
>>>> transfers data from ofw_bus_intr_to_rl() to consumers PIC.
>>>>
>>>> So, if we are able to attach 'key' to RLE,  then we can remove the
>>>> whole
>>>> cache machinery from INTRNG.
>>>> Do you agree?
>>> No. Because you have to support two things that this still can't allow:
>>> 1. Devices attaching before their interrupt controllers.
>> You still expect that data readed from OFW  must be parsed at RLE
>> creation time. I don't see single reason why we cannot postpone
>> parsing to bus_alloc_resource() time.
>>   At this point, its irrelevant if  bus_alloc_resource()  supports
>> detached PIC or not.
>
> Because there are *lots* of cases in the kernel where interrupts are
> represented as a number and not a struct resource *. I've listed many
> of them; I'm sure there are more. PCI is the most notable example
> (both LSI and MSI). For both of these, the route_interrupt (LSI) and
> alloc_msi[x]() (MSI) functions give the bus driver a chance to look at
> the device tree and then they return an IRQ number. There doesn't seem
> to be any way to map that back to a struct resource * or something
> besides an internal table.
>
> Moreover, there are cases (self-added interrupts by children, for
> instance) in which the RL assigned by the parent does not reflect the
> final RL used by the child and there is no way for the parent bus to
> map an RID to a device tree entry.
>
> When we moved to doing this at interrupt assignment time rather than
> resource allocation time or another occasion on PowerPC, it was for
> these kinds of reasons. There were ways to do it later, but they were
> hugely invasive and involved changing large parts of the kernel. Even
> then, it would still be fragile: Trying to guess what device tree
> entry was meant at bus_alloc_resource() time is much more error-prone
> than doing the mapping when reading that device tree to assign the
> resources in the first place, when you can make guarantees.
>
>>
>>> 2. Interrupts encoded entirely in a 32-bit integer rather than a
>>> struct resource * in a shared rman. This is required to support PCI
>>> (both LSI and MSI).
>>>
>>> #2 could of course be fixed by completely retooling the API of the PCI
>>> bus code, but, since it fixes something that currently is not a
>>> problem, why would we do that?
>> All relevant bus functions have struct resource * as an argument. And,
>> in post r301453 kernel, all necessary data are attached to it.
>> I don't see any reason for API change.
>
> There is:
> int PCIB_ROUTE_INTERRUPT(device_t, device_t, pin)
>
> Which takes a PIN (A, B, C, D) and returns the IRQ to add to the RL.
> This is the only real opportunity to parse an interrupt-map and does
> not deal with struct resource.
>
> There are also PCIB_ALLOC_MSI[X], which take a device_t and count and
> return a vector of IRQ numbers.
>
> How would you do resource allocation given these APIs? There seem to
> be some dubious-looking attempts at generic MSI mapping code in the
> new kern/subr_intr.c that don't allow the specification of more than
> 32-bit interrupt specifiers and otherwise exactly duplicate the
> pre-301453 flow for MSIs, but in a more complicated and
> less-functional way.
>
> (While investigating this, I also just found
> dev/pci/pci_host_generic.c, which mostly does the wrong things, is
> actually specific to a couple of ARM boards, and duplicates code in
> dev/ofw/ofwpci.c. But that's another discussion...)
[snip][2]
>>>>>> Each OFW device has a lot of dependencies. It must enable  clocks,
>>>>>> power
>>>>>> regulators..., it must be taken from reset. All these action must be
>>>>>> done before we can access single register.
>>>>>> Most of these action can be done  only with attached
>>>>>> clock/power/reset
>>>>>> controller.
>>>>>> Within this, interrupts are not special.
>>>>> They are actually quite special in that the kernel enables them late
>>>>> and so you can defer setup. They are also special in that, for that
>>>>> reason, it is possible to design systems with circular dependency
>>>>> graphs with interrupts. It is not possible to do this with, for
>>>>> example, clocks: if I need to apply a clock to the clock controller
>>>>> with itself, the system is just not bootable and such a system will
>>>>> not be built or shipped. Interrupts need only happen later, after
>>>>> device setup, and so such systems *are* designed and shipped. The
>>>>> same
>>>>> is true for power.
>>>> The G5 problem is standard  'cross type' circular dependency.  You
>>>> must
>>>> (at BUS_PASS_BUS)  initialize memory address window of PCI bridge and
>>>> start bus enumeration.
>>>> Rest of PCI bridge initialization (including all IRQ stuff) must be
>>>> postponed to any later pass. Of course, all interrupt controllers must
>>>> be attached at BUS_PASS_INTERRUPT.
>>>> Or I missed something?
>>> Yes. As I said, you could solve it that way. It would, of course,
>>> require a massive retooling of newbus (to allow partial delayed
>>> attach), everything in dev/pci (which doesn't expect that), and
>>> (almost) every PCI driver in and out of the tree.
>> I still think that only host to PCI bridge must me modified, and PCI-PCI
>> bridge must be run at  BUS_PASS_BUS (which is single line modification).
>
> On Powermacs, one PIC is 5 levels deep in the bus hierarchy, under 3
> PCI<->PCI bridges and a device that also has an ATA controller and a
> bunch of other things.
Yes, I see. And, as a said before,  it's pretty common situation in OFW
based world.
Imho, all whats is needed is:
https://github.com/strejda/freebsd/commit/96ab64e04dcebb131643c2ae1cf373194d23e8e5
(of course, it's not tested, I havn't access to powermac HW)
 
It's hard for me to understand why you say that this change requires
"massive retooling of newbus,  everything in dev/pci,  every PCI driver).

> Some of those can have their own interrupts (which is fairly common
> for error reporting or PCI-E hotplug). It wouldn't be impossible, but
> it is decidedly nontrivial.
That's true.  Any 'bus' driver, in multipass environment, *must* export
provided resources,  make bus accessible and enumerate it at
'BUS_PASS_BUS', and
consume resources at some later bus pass.
At this time, only pcie-hotplug  needs this change. But I don't see why
a few lines change is "decidedly nontrivial".

>
>>> Or: you could skip all of that and use the fully functional mechanism
>>> that already exists.
>> The problem is, at least from my point of view, that we don't have it.
>> The actual MIPS code doesn't work on ARM for numerous reasons.  This is
>> first one ->
>>  
>> https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
>> The pre-r301453  ARM INTRNG doesn't support detached interrupts.
>> And r301453  doesn't changed nothing about this.
>
> I'm happy to write whatever code is missing. The ARM implementation of
> ofw_bus_map_intr() does seem fairly braindead and should be replaced.
>
> So here is where I am right now:
> - The perceived advantage of the new API seems to be that the mapping
> information (interrupt parent, etc.) ends up in a struct resource
> instead of in a centralized mapping table
True. And, in optimal case,  also in RLE. See [1].
> - Additionally, it gives you a better shot at having the PIC online
> before the PIC's interrupts are parsed (which is not required, but nice).
Nope,  we simply fount that detached pics is very dangerous and not
needed. But, if you love it, it can be added as MD extension.
> - Beyond these aesthetic points, there are no concrete examples of new
> functionality added by this API, aside from some minor implementation
> bugs of the old one on ARM that are easily fixed.
Really? Or you just ignore it?

> - There are, conversely, a number of concrete cases where this new API
> would not be able to do the right thing. Some of these can be worked
> around or fixed with significant restructuring in the MI parts of the
> kernel.
And i offer you a fix, without "significant restructuring in the MI
parts of the kernel". Unfortunately, you did not want to accept anything
other than the old API.

> - If we have both, the interrupt handling code in the MI parts of the
> kernel will bifurcate. This patch alone has added a bunch of #ifdef to
> kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw.
> This is going to be really hard to maintain if we need both.
All those #ifdefs  has been added as safeguards for 11/stable and they
can be removed immediately.
Michal

>
> Is this all correct?
>
> If so, can we please back this out until this discussion is complete?
> I'm asking this formally at this point, under the Committer's Guide
> section about reversion requests.
> -Nathan



More information about the freebsd-arm mailing list