INTRNG (Was: svn commit: r301453....)
    Nathan Whitehorn 
    nwhitehorn at freebsd.org
       
    Thu Jul 28 17:34:29 UTC 2016
    
    
  
On 07/28/16 08:33, Michal Meloun wrote:
> Dne 27.07.2016 v 19:19 Nathan Whitehorn napsal(a):
>>
>> On 07/27/16 09:27, Michal Meloun wrote:
>>
>>
>> [snip]
>>>> The concept is *extremely* necessary, which is why both Linux and
>>>> FreeBSD decided to use it independently There is no way to handle
>>>> parent buses with a single rman and devices on multiple PICs with
>>>> overlapping interrupt ranges without them; neither is there a way to
>>>> decode arbitrary-length interrupt specifiers or to handle things like
>>>> MSIs. Please see the list of cases at
>>>> https://wiki.freebsd.org/Complicated_Interrupts and in my earlier
>>>> email for some examples of things that you just can't represent with
>>>> this new system, as far as I can tell.
>>>>
>>>>>      Also, both variants needs attached PIC at bus_alloc_resource()
>>>>> time,
>>>>> so timing wasn't been changed.
>>>> They absolutely do not, as I have explained repeatedly. Due to parent
>>>> devices with interrupts handled by their bus children, this is a hard
>>>> requirement of any workable system. This is not a theoretical issue; I
>>>> have lots of hardware like this.
>>> Nathan,  I'm talking  about pre /post  r301453 state. Current INTRNG it
>>> has never supported. But yes,  I'm fully understand why you want it. See
>>> [2].
>> Current "INTRNG", at least on PowerPC, has supported this for 10
>> years. If it doesn't on ARM, it's because of some trivial
>> implementation bug.
>>>>> - it implements new BUS_MAP_INTR(). As I understand it,  this is
>>>>> problematic for you, and I'm ready to change it. But I need more
>>>>> details
>>>>> than "it's fundamentally broken".
>>>> Please explain (a) what cases it handles that the existing code and
>>>> does, and (b) how you would resolve each of the cases on the wiki page
>>>> I sent.
>>>>
>>>> The general issue is that it traces the newbus hierarchy, when
>>>> interrupts often do not, and so breaks when you have links to
>>>> as-yet-unattached parts of the hierarchy. It also relies on the
>>>> assigned "IRQ" numbers being nonoverlapping to avoid rman errors, but
>>>> that's a relatively minor thing.
>>> Well, from my point of view,  only  "get interrupt data from
>>> OFW/GPIO/.."  part of bus_alloc_resource() follows bus hierarchy,  real
>>> execution not changed.
>>> In any case,  I see many possible variants how we can modify current
>>> implementation. For rest, see [2].
>> But that's an important part, for three reasons:
>> 1. The PIC may not exist when bus_alloc_resource() is called
>> 2. The bus parents have no useful information to contribute to this
>> since the hierarchy doesn't go that way
> See [2]
>
>> 3. If you try to use the interrupt pin as the resource ID, which
>> r301453 does, you end up with rman conflicts if, for example, a bus
>> has two children with interrupts on identically numbered pins on two
>> different interrupt controllers.
> No, r301453 doesn't do this.
> Index to global interrupt source table (irq_sources) is used as resource
> ID (and r301453 not changed it).
Ah, I see that now. Apologies for the confusion.
So, to understand fully: your major complaint about the existing code is 
that it uses arbitrary numbers reflecting some table lookup value as IRQ 
numbers, which r301453 *also* does?
>>>>>    
>>>>>> There are some other differences, of varying degrees of importance,
>>>>>> but that's the really fundamental one. I haven't seen any cases where
>>>>>> r301453 provides functionality absent in the already existing system,
>>>>>> but there seem to be a large number of cases (see the first email I
>>>>>> sent to freebsd-arch or
>>>>>> https://wiki.freebsd.org/Complicated_Interrupts) that the API in
>>>>>> r301453 cannot accommodate and that are needed to support a
>>>>>> variety of
>>>>>> hardware.
>>>>> Which single feature has been removed by r301453?
>>>> Nothing has been removed by it, because the normal code is intact.
>>>> However, the new code is less functional than the old code, so cannot
>>>> replace it. In particular, it is architecturally incapable of working
>>>> with the kinds of device trees found on PowerPC systems (see the wiki
>>>> page). That means we would have to keep both indefinitely, which is a
>>>> significant maintenance burden to no gain whatsoever.
>>>> -Nathan
>>> Firstly,  please,  ignore dependency problem. It will be explained later
>>> in [2].
>> Except that that isn't a workable solution (see the end)
>>
>>> [1]
>>> The original (your) INTRNG assume several things:
>>>
>>> - value of interrupt parent xref  together with byte contents of
>>> 'interrupts' property forms some sort of  'key' (aka virtual IRQ)
>>> - byte contents of  'interrupts' property cannot be parsed in any way
>>> inside INTRNG core.
>>> - data forming this key are sufficient for subsequent mapping (and/or
>>> configuring) to real IRQ.
>>> - there must  exist bidirectional unique relation between virtual and
>>> real IRQs.  So one key (virtual IRQ)  can be mappable to one and just
>>> one
>>>      real IRQ.  And only one virtual IRQ  can be mapped to any given real
>>> IRQ.
>>> - we have cache that stores all observed 'keys' and associated real IRQ
>>> ) the they already mapped.
>>>
>>> Unfortunately,  necessity of unique mapping is very problematic.
>>> We cannot handle situation, where shared interrupt is declared by
>>> different but compatible 'interrupts' properties.
>> There isn't actually a requirement of bidirectional uniqueness, as I
>> explained the last time you brought this up. One "virtual" IRQ does
>> need to map to exactly one interrupt specifier, but the reverse is not
>> the case. The PIC driver is absolutely free to map and dispatch
>> multiple virtual IRQs from the same shared interrupt pin, with no more
>> overhead than you usually get from shared interrupt lines.
> Oki, lets me to expand this a little bit more.
>
> The pre r301453 INTRNG[ARM] does this:
> - ofw_bus_intr_to_rl() reads and pre-parses  'interrupts' property
> into  'key'.
Sometimes. For some buses. ofw_bus_intr_to_rl() is a function of very 
limited application that should really only be used by simplebus; how 
that works depends on the bus bindings. There are many cases (e.g. MSIs, 
PCI buses generally) where interrupts are set up by a different route.
> - cache is searched for duplicates and new entry is created.
> - position of this entry in cache  is used as resource ID.
Resource ID is arbitrary, but it could be this.
> - given resource ID is stored to RL  and is later used in
> bus_alloc_resource()
Yes.
> I'm right ?
>
> In the above model, individual cache entries may be shared are
> referenced by multiple entities (PIC, RL, ..).  Due to this, life cycle
> of cache entries is not well defined and we probably must use some sort
> of reference counting to control it.
Why? Interrupts are not unmapped and remapped with unique values often 
enough (or, really, at all) to warrant even thinking about life-cycle 
management. The maximum number is limited by the maximum of the number 
of interrupt specifiers in the device tree, which are all static values.
> But, as you wrote,  INTRNG code is "absolutely free to map and dispatch
> multiple virtual IRQs from the same shared interrupt pin" (allow me to
> expand this also to INTRNG code itself) .
> In attempt to simply  situation abound cache, in first step, we removed
> duplicates check from above model.  This also removes unpleasantly
> sharing of entries.
> This step converts cache to some sort of duplicate of resource list
> entries. Each existing RLE have exactly one corresponding entry in cache.
> Also,  the cache is not needed for INTRNG core. At this point, its only
> transfers data from ofw_bus_intr_to_rl() to consumers PIC.
>
> So, if we are able to attach 'key' to RLE,  then we can remove the whole
> cache machinery from INTRNG.
> Do you agree?
No. Because you have to support two things that this still can't allow:
1. Devices attaching before their interrupt controllers.
2. Interrupts encoded entirely in a 32-bit integer rather than a struct 
resource * in a shared rman. This is required to support PCI (both LSI 
and MSI).
#2 could of course be fixed by completely retooling the API of the PCI 
bus code, but, since it fixes something that currently is not a problem, 
why would we do that?
>
>>> For example:
>>> -------------------------
>>>        foo1: bar at 12345678 {
>>>            interrupt-parent = <&pic1>;
>>>            interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
>>>       }
>>>
>>>        foo2: bar at 1234567C {
>>>            interrupt-parent = <&pic1>;
>>>            interrupts = <1 IRQ_TYPE_NONE>;
>>>       }
>>>
>>> Is this valid (from DT point of view) configuration?  I think that yes.
>> No, it's malformed and a violation of the standard.
> Can you give me pointer, please? I'm not able to find anything about
> this...
I will try to find the reference -- unfortunately these things are 
scattered about and not all of the documents are public. You can see 
that this must be invalid immediately, however, because it would make 
the interrupt configuration dependent on which devices attach and in 
which order. I've attached the main interrupt mapping spec if you would 
like to look at it.
>
>>> Is this frequent configuration?  I'm sure,  isn't.
>>> Is this possible configuration?  I'm afraid that yes.
>> People do in fact do deranged non-conformant things with device trees
>> sometimes, unfortunately, so it is indeed worth worrying about.
>>
>>> --------------------------
>>>
>>> Next example are GPIO interrupts. These are encoded differently,  so two
>>> 'keys' may point to to same real IRQ.
>>> After that we decided to change our approach and utilize standard
>>> resources and resource list entries to deliveryconfiguration data from
>>> various sources (OFW/GPIO/...)  to consumers.
>> This is a non-issue. You make the GPIO controller a PIC, it handles
>> interrupts however you like. If you got a weird device tree that uses
>> two encodings (one for "interrupts" properties with GPIOs and one for
>> "GPIO" properties on which you need to take interrupts but that
>> weren't added to the "interrupts" list for mysterious reasons), you
>> introduce a helper function for the GPIO case.
>>
>> Are there any actual problems with the pre-existing interrupt mapping
>> code? I have not seen any so far.
>>
>>> [2]
>>> Each OFW device has a lot of dependencies. It must enable  clocks, power
>>> regulators..., it must be taken from reset. All these action must be
>>> done before we can access single register.
>>> Most of these action can be done  only with attached clock/power/reset
>>> controller.
>>> Within this, interrupts are not special.
>> They are actually quite special in that the kernel enables them late
>> and so you can defer setup. They are also special in that, for that
>> reason, it is possible to design systems with circular dependency
>> graphs with interrupts. It is not possible to do this with, for
>> example, clocks: if I need to apply a clock to the clock controller
>> with itself, the system is just not bootable and such a system will
>> not be built or shipped. Interrupts need only happen later, after
>> device setup, and so such systems *are* designed and shipped. The same
>> is true for power.
> The G5 problem is standard  'cross type' circular dependency.  You must
> (at BUS_PASS_BUS)  initialize memory address window of PCI bridge and
> start bus enumeration.
> Rest of PCI bridge initialization (including all IRQ stuff) must be
> postponed to any later pass. Of course, all interrupt controllers must
> be attached at BUS_PASS_INTERRUPT.
> Or I missed something?
Yes. As I said, you could solve it that way. It would, of course, 
require a massive retooling of newbus (to allow partial delayed attach), 
everything in dev/pci (which doesn't expect that), and (almost) every 
PCI driver in and out of the tree.
Or: you could skip all of that and use the fully functional mechanism 
that already exists.
This is my point in a nutshell, basically. This code reinvents a 
perfectly functional wheel in a way that could, in the future, with a 
huge amount of work, accomplish the same things. For now, it is less 
functional and offers no visible advantages or new capabilities of any 
kind -- nor are there any such capabilities it could obviously provide 
in the future. Between now and that future, which will likely never 
arrive, we will have to maintain both in parallel and cause the device 
tree support on ARM and PowerPC to diverge.
-Nathan
>>> Moreover, 'cross type' circular
>>> dependencies are not that rare.
>>> I want to say:
>>> Resource dependencies are must solved (and resolved) at different level
>>> than is INTRNG.
>>> Blind resource allocation is not universal solution because given
>>> resource may/must be accessible immediately after allocation (in many
>>> cases).
>> Absolutely: for GPIOs, clocks, power, etc. you want a system that
>> looks different than the interrupt virtualization system. Probably
>> with extra resource types and maybe with some API similar to r301453.
>> But interrupts have different, and more complex, requirements and
>> cannot be mapped this way.
>>
>>> Unfortunately,  and at time time, I known only one really working
>>> solution:
>>> staggered driver initialization combined with multipass bus
>>> initialization.
>> That does not solve the interrupt problem. If devices have interrupts
>> on their own children, no amount of multipass initialization can
>> possibly break the loop. Multipass would work for other kinds of
>> resources (clocks, power, etc.) and is a perfect match for those
>> resource types. A dynamic multipass (where a driver can return EAGAIN
>> or something from attach if its resources don't exist yet) would work
>> great.
>>
>> *But* in the specific, special case of interrupts, it is not a
>> workable model. You could imagine some change to initialization that
>> gives devices a late attach and an early attach method and does
>> interrupts in the late part, but that is hugely invasive change that
>> would need to touch every single driver in the tree -- to solve a
>> problem that is already solved by interrupt virtualization.
>> -Nathan
>>
>>> I'm sorry, I'm not able to express all this accurately and clearly, but
>>> I've really tried...
>>> Michal
>>>
>>>
>>> _______________________________________________
>>> freebsd-arch at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
>>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>>>
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>
    
    
More information about the freebsd-arch
mailing list