Re: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in ARM64 SMP

From: Kyle Evans <kevans_at_freebsd.org>
Date: Mon, 15 May 2023 13:54:06 UTC
On Mon, May 15, 2023 at 2:41 AM Souradeep Chakrabarti
<schakrabarti@microsoft.com> wrote:
>
>
>
>
> >-----Original Message-----
> >From: Kyle Evans <kevans@freebsd.org>
> >Sent: Monday, May 15, 2023 10:36 AM
> >To: Souradeep Chakrabarti <schakrabarti@microsoft.com>
> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org
> >Subject: Re: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in ARM64 SMP
> >
> >On Sun, May 14, 2023 at 9:59 AM Souradeep Chakrabarti
> ><schakrabarti@microsoft.com> wrote:
> >>
> >>
> >>
> >>
> >> >-----Original Message-----
> >> >From: Kyle Evans <kevans@freebsd.org>
> >> >Sent: Friday, May 12, 2023 10:40 PM
> >> >To: Souradeep Chakrabarti <schakrabarti@microsoft.com>
> >> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org
> >> >Subject: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in
> >> >ARM64 SMP
> >> >
> >> >On Fri, May 12, 2023 at 9:51 AM Souradeep Chakrabarti
> >> ><schakrabarti@microsoft.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: Souradeep Chakrabarti
> >> >> >Sent: Monday, May 8, 2023 6:39 PM
> >> >> >To: Kyle Evans <kevans@freebsd.org>
> >> >> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org
> >> >> >Subject: enabling same PPI interrupt to all CPU in ARM64 SMP
> >> >> >
> >> >> >Hi ,
> >> >> >
> >> >> >While using SMP in ARM64 Hyper-V we are getting stuck in boot if
> >> >> >there is a interrupt for VMBus coming to CPU1 and VMBus interrupt
> >> >> >handler is not getting that interrupt.
> >> >> >
> >> >> >In ARM64 Hyper-V we are using IRQ18 for VMBus and it is a PPI interrupt.
> >> >> >
> >> >> >But Hypev-V host sends interrupt to this IRQ 18 for both CPU0 and
> >> >> >CPU1 in 2CPU system.
> >> >> >This is based on the corresponding VMBus channel which assigned with the
> >CPU.
> >> >> >
> >> >> >Now VMBus ISR is getting the interrupt in CPU0 but not getting from CPU1.
> >> >> >Any idea, how we can use the same PPI 18 for all the CPU cores?
> >> >> >
> >> >> >Any help will be appreciated, as this is blocking the enablement
> >> >> >of FreeBSD in Azure ARM64.
> >> >> [Souradeep]
> >> >> Can someone please help me it.
> >> >>
> >> >
> >> >Looking at least at the GIC implementation, it looks like this is a known limitation:
> >> >
> >> > 875         /*
> >> > 876          * XXX - In case that per CPU interrupt is going to be
> >> >enabled in time
> >> > 877          *       when SMP is already started, we need some IPI
> >> >call which
> >> > 878          *       enables it on others CPUs. Further, it's more
> >> >complicated as
> >> > 879          *       pic_enable_source() and pic_disable_source()
> >> >should act on
> >> > 880          *       per CPU basis only. Thus, it should be solved
> >> >here somehow.
> >> > 881          */
> >> > 882         if (isrc->isrc_flags & INTR_ISRCF_PPI)
> >> > 883                 CPU_SET(PCPU_GET(cpuid), &isrc->isrc_cpu);
> >> >
> >> >I think we need something /like/ this:
> >> >https://peo/
> >>
> >>ple.fr%2F&data=05%7C01%7Cschakrabarti%40microsoft.com%7C39d7c50849b
> >64
> >>
> >>eaf70b608db55020fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> >7C6381
> >>
> >>97239615544113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLC
> >JQIjoiV2
> >>
> >>luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dXVvnI
> >HhN9
> >> >mdwiPkSJKwMyEKYi5SyGOuta5zCZ1ysCQ%3D&reserved=0
> >>
> >>eebsd.org%2F~kevans%2Fppi.diff&data=05%7C01%7Cschakrabarti%40microsof
> >> >t.c
> >om%7Cc5c3d254b9d841e9ae9b08db530bb3d2%7C72f988bf86f141af91ab2d7c
> >>
> >>d011db47%7C1%7C0%7C638195082027744706%7CUnknown%7CTWFpbGZsb
> >3
> >>
> >>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
> >>
> >>%7C3000%7C%7C%7C&sdata=SwoR2vHxh6QQhwpOgFcQ9378nDVovhdEKEXoE
> >o
> >> >gPKsc%3D&reserved=0, though it still has the caveat that PPIs
> >> >effectively cannot be fully setup before SI_SUB_SMP.
> >> >So, it's likely almost a NOP for existing platforms (will emit a
> >> >warning with bootverbose for armv8 timers) but might do the trick for you.
> >> [Souradeep]
> >> Thanks for the change but it did not solve the problem. Still the
> >> interrupt handler vmbus_handle_intr(struct trapframe *trap_frame), is not
> >getting called for the CPU 1.
> >> It is only getting called for CPU 0 all the time in ARM64 but in x86
> >> it is getting called for both CPU1 and CPU0.
> >
> >Interesting! I do see one problem with the patch (and some cosmetic
> >issues): we really need to take the gic_mtx in gic_v3_setup_periph() right up front
> >because CPU_SET() won't necessarily be atomic. That's not the problem, though for
> >other reasons, but also because...
> >
> >> From DDB I have collected this data in arm64. irq18 is for vmbus.
> >> db> show irqs
> >> ...
> >> irq18  <gic0,p2>: cpu 03 cnt 411
> >> ....
> >>
> >
> >That would seem to indicate that both CPUs have set it up, but it occurs to me that
> >enable_intr also needs the same treatment. Let's wipe gic_v3.c back to a clean slate
> >and try a v2 of the patch:
> >https://people.fr/
> >eebsd.org%2F~kevans%2Fppi-
> >v2.diff&data=05%7C01%7Cschakrabarti%40microsoft.com%7C39d7c50849b64ea
> >f70b608db55020fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
> >38197239615544113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> >LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdat
> >a=QLDm8X1yuuGOy2OeeYxsnF%2FNeuceAogw5161U7ZH6%2Bs%3D&reserved=0
> >
> >For now we just pretend that we won't be disabling any PPIs as a proof-of-concept.
> >
> [Souradeep]
> This is causing panic during boot:
> pmu0: MADT: cpu 0 (mpidr 0) irq 0 level-triggered
> pmu0: MADT: cpu 1 (mpidr 1) irq 0 level-triggered
> panic: gic_v3_enable_intr_impl: Unsupported IRQ 0
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> gic_v3_enable_intr_impl() at gic_v3_enable_intr_impl+0xec
> intr_setup_irq() at intr_setup_irq+0x368
> bus_setup_intr() at bus_setup_intr+0x94
> pmu_attach() at pmu_attach+0x64
> pmu_acpi_attach() at pmu_acpi_attach+0x94
> device_attach() at device_attach+0x3f8
> device_probe_and_attach() at device_probe_and_attach+0x7c
> bus_generic_new_pass() at bus_generic_new_pass+0xfc
> bus_generic_new_pass() at bus_generic_new_pass+0xac
> bus_generic_new_pass() at bus_generic_new_pass+0xac
> bus_set_pass() at bus_set_pass+0x4c
> mi_startup() at mi_startup+0x1fc
> virtdone() at virtdone+0x70
> KDB: enter: panic
> [ thread pid 0 tid 100000 ]
> Stopped at      kdb_enter+0x44: str     xzr, [x19, #3328]
> Details of the log I have pasted here: https://pastebin.com/eQN5RntD

I think you fetched a broken version of the patch -- there was maybe
an hour where I kept replacing it because I found a problem and fixed
it, but created another one to fix that I discovered on some hardware
I tested on. I've re-uploaded the correct diff at
https://people.freebsd.org/~kevans/ppi-v3.diff, which can't hit the
panic because irq <= GIC_LAST_PPI all return early (the version you
have, the first branch in gic_v3_enable_intr_impl was probably an
incorrect `if (isrc->isrc_flags & INTR_ISRCF_PPI)`

Thanks,

Kyle Evans