Re: debugging a msi interrupt panic

From: John Hay <john_at_sanren.ac.za>
Date: Mon, 25 Sep 2023 08:30:16 UTC
Hi,

I found the cause of the panics and thought I should give feedback here.

MSI interrupt vectors have to be aligned to the size of the allocation,
which have to be in powers of 2 up to a maximum of 32 vectors.

native_apic_alloc_vectors(), which is used by pci_alloc_msi(), does not
properly align the vectors that it allocates. It does an aligned check, but
then after that adds APIC_IO_INTS, which is 48, and so the alignment is
lost. So allocating smaller numbers of vectors would work because 48 is a
multiple of 16.

If you allocate 32 MSI interrupts, the pcie card just ignores the bottom 5
bits of the vector that was written to its MSI_DATA register, so in my case
where native_apic_alloc_vectors() allocated a base vector of 0x50, the card
ignores bit 4 and generates interrupt vectors from 0x40 to 0x5f.

I have opened a bug report about it:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274074

Regards

John


On Fri, 22 Sept 2023 at 13:00, John Hay <john@sanren.ac.za> wrote:

> Hi,
>
> How do one debug msi interrupts on FreeBSD when writing a device driver?
>
> I'm writing a driver for a OCP TimeCard. The card has several different
> functions, each "tied" to a specific msi interrupt. So it would be useful
> to use multiple msi interrupts. The PCIe functionality is a Xilinx AXI PCIe
> blob, with 32 MSI interrupts. It does not have MSI-X capability.
>
> <snip>
> # pciconf -lbc none1@pci0:1:0:0:
> none0@pci0:1:0:0: class=0x058000 rev=0x00 hdr=0x00 vendor=0x1d9b
> device=0x0400 subvendor=0x10ee subdevice=0x0007
>     bar   [10] = type Memory, range 32, base 0xf4000000, size 33554432,
> enabled
>     cap 01[40] = powerspec 3  supports D0 D3  current D0
>     cap 05[48] = MSI supports 32 messages, 64 bit
>     cap 10[60] = PCI-Express 2 endpoint max data 256(256) NS
>                  max read 512
>                  link x1(x1) speed 2.5(2.5) ASPM disabled(L0s)
>     ecap 0003[100] = Serial 1 0000000000000000
> </snip>
>
> Currently in attach, I'm doing the following (error checking removed):
>
> <snip>
> pci_enable_busmaster(dev); /* msi irq needs it enabled, at least for the
> AXI core */
> sc->sc_msi = pci_msi_count(dev);
> pci_alloc_msi(dev, &sc->sc_msi)
> loop for each function /* currently just a single iteration to get one
> function working */
>    port->pp_irid = port->intr_num + 1; /* msi irq no starts at one */
>    port->pp_ires = bus_alloc_resource_any(dev, SYS_RES_IRQ,
> &port->pp_irid, RF_ACTIVE | RF_SHAREABLE);
>    bus_setup_intr(dev, port->pp_ires, INTR_TYPE_TTY | INTR_MPSAFE,
> timecard_intr, NULL, port, &port->p_ihandle)
> </snip>
>
> I did define a variable timecard_intr_count that timecard_intr() should
> increment, but it does not change, so timecard_intr() probably does not get
> called.
>
> Here is the console output of the panic and a show apic and show lapic.
> The driver is not compiled in, but loaded later manually.
>
> <snip>
> pci0: driver added
> found-> vendor=0x8086, dev=0x1e3a, revid=0x04
>         domain=0, bus=0, slot=22, func=0
>         class=07-80-00, hdrtype=0x00, mfdev=1
>         cmdreg=0x0006, statreg=0x0010, cachelnsz=0 (dwords)
>         lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
>         intpin=a, irq=16
>         powerspec 3  supports D0 D3  current D0
>         MSI supports 1 message, 64 bit
> pci0:0:22:0: reprobing on driver added
> timecard0: TimeCard Probe
> Vendor ID : 0x8086
> Device ID : 0x1e3a
> pci1: driver added
> found-> vendor=0x1d9b, dev=0x0400, revid=0x00
>         domain=0, bus=1, slot=0, func=0
>         class=05-80-00, hdrtype=0x00, mfdev=0
>         cmdreg=0x0002, statreg=0x0010, cachelnsz=16 (dwords)
>         lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
>         powerspec 3  supports D0 D3  current D0
>         MSI supports 32 messages, 64 bit
> pci0:1:0:0: reprobing on driver added
> timecard0: TimeCard Probe
> Vendor ID : 0x1d9b
> Device ID : 0x400
> We've got the TimeCard, probe successful!
> timecard0: TimeCard Probe
> Vendor ID : 0x1d9b
> Device ID : 0x400
> We've got the TimeCard, probe successful!
> timecard0: <TimeCard> mem 0xf4000000-0xf5ffffff at device 0.0 on pci1
> TimeCard Attach for : deviceID : 0x4001d9b
> Address of timecard_intr_count 0xffffffff8314e400
> timecard0: attempting to allocate 32 MSI vectors (32 supported)
> msi: routing MSI IRQ 38 to local APIC 4 vector 80
> msi: routing MSI IRQ 39 to local APIC 4 vector 81
> msi: routing MSI IRQ 40 to local APIC 4 vector 82
> msi: routing MSI IRQ 41 to local APIC 4 vector 83
> msi: routing MSI IRQ 42 to local APIC 4 vector 84
> msi: routing MSI IRQ 43 to local APIC 4 vector 85
> msi: routing MSI IRQ 44 to local APIC 4 vector 86
> msi: routing MSI IRQ 45 to local APIC 4 vector 87
> msi: routing MSI IRQ 46 to local APIC 4 vector 88
> msi: routing MSI IRQ 47 to local APIC 4 vector 89
> msi: routing MSI IRQ 48 to local APIC 4 vector 90
> msi: routing MSI IRQ 49 to local APIC 4 vector 91
> msi: routing MSI IRQ 50 to local APIC 4 vector 92
> msi: routing MSI IRQ 51 to local APIC 4 vector 93
> msi: routing MSI IRQ 52 to local APIC 4 vector 94
> msi: routing MSI IRQ 53 to local APIC 4 vector 95
> msi: routing MSI IRQ 54 to local APIC 4 vector 96
> msi: routing MSI IRQ 55 to local APIC 4 vector 97
> msi: routing MSI IRQ 56 to local APIC 4 vector 98
> msi: routing MSI IRQ 57 to local APIC 4 vector 99
> msi: routing MSI IRQ 58 to local APIC 4 vector 100
> msi: routing MSI IRQ 59 to local APIC 4 vector 101
> msi: routing MSI IRQ 60 to local APIC 4 vector 102
> msi: routing MSI IRQ 61 to local APIC 4 vector 103
> msi: routing MSI IRQ 62 to local APIC 4 vector 104
> msi: routing MSI IRQ 63 to local APIC 4 vector 105
> msi: routing MSI IRQ 64 to local APIC 4 vector 106
> msi: routing MSI IRQ 65 to local APIC 4 vector 107
> msi: routing MSI IRQ 66 to local APIC 4 vector 108
> msi: routing MSI IRQ 67 to local APIC 4 vector 109
> msi: routing MSI IRQ 68 to local APIC 4 vector 110
> msi: routing MSI IRQ 69 to local APIC 4 vector 111
> timecard0: using IRQs 38-69 for MSI
> TimeCard msi 32
> rman iomem start f4000000, end f5ffffff, size 2000000
> timecard_corelist_read did not find list, using fb list 1
> nports 1
> Start with uart 0 0
> irq 41, irid 4
>
>
> Fatal trap 30: reserved (unknown) fault while in kernel mode
> cpuid = 2; apic id = 04
> port irq rid 4
> instruction pointer     = 0x20:0xffffffff804dda2d
> stack pointer           = 0x28:0xfffffe001b1d8de0
> TimeCard device loaded.
> pci2: driver added
> pci3: driver added
> pci4: driver added
> frame pointer           = 0x28:0xfffffe001b1d8e10
> code segment            = base 0x0, limit 0xfffff, type 0x1b
> pci5: driver added
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, IOPL = 0
> current process         = 11 (idle: cpu2)
> trap number             = 30
> panic: reserved (unknown) fault
> cpuid = 2
> time = 1695375919
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe001b1d8c00
> vpanic() at vpanic+0x151/frame 0xfffffe001b1d8c50
> panic() at panic+0x43/frame 0xfffffe001b1d8cb0
> trap_fatal() at trap_fatal+0x387/frame 0xfffffe001b1d8d10
> calltrap() at calltrap+0x8/frame 0xfffffe001b1d8d10
> --- trap 0x1e, rip = 0xffffffff804dda2d, rsp = 0xfffffe001b1d8de0, rbp =
> 0xfffffe001b1d8e10 ---
> acpi_cpu_idle() at acpi_cpu_idle+0x28d/frame 0xfffffe001b1d8e10
> cpu_idle_acpi() at cpu_idle_acpi+0x4d/frame 0xfffffe001b1d8e30
> cpu_idle() at cpu_idle+0xac/frame 0xfffffe001b1d8e50
> sched_idletd() at sched_idletd+0x4b1/frame 0xfffffe001b1d8ef0
> fork_exit() at fork_exit+0x80/frame 0xfffffe001b1d8f30
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001b1d8f30
> --- trap 0xe0c2545f, rip = 0x8f4a0c76246ffed9, rsp = 0x670c1cf091ea5e5b,
> rbp = 0x57639a151e2b2985 ---
> KDB: enter: panic
> [ thread pid 11 tid 100005 ]
> Stopped at      kdb_enter+0x37: movq    $0,0x129840e(%rip)
> db> x/wx timecard_intr_count
> timecard_intr_count:    0
> db> show apic
> Interrupts bound to lapic 0
> vec 0x30 -> IRQ 9
> vec 0x31 -> IRQ 28
> vec 0x32 -> IRQ 31
> vec 0x33 -> IRQ 24
> vec 0x34 -> IRQ 29
> vec 0x35 -> IRQ 30
> vec 0x36 -> IRQ 8
> vec 0x37 -> IRQ 16
> vec 0x38 -> IRQ 36
> vec 0x39 -> IRQ 1
> vec 0xef -> lapic timer
> Interrupts bound to lapic 2
> vec 0x30 -> IRQ 25
> vec 0x31 -> IRQ 33
> vec 0x32 -> IRQ 23
> vec 0x33 -> IRQ 18
> vec 0xef -> lapic timer
> Interrupts bound to lapic 4
> vec 0x30 -> IRQ 26
> vec 0x31 -> IRQ 0
> vec 0x32 -> IRQ 34
> vec 0x33 -> IRQ 37
> vec 0x50 -> IRQ 38
> vec 0x51 -> IRQ 39
> vec 0x52 -> IRQ 40
> vec 0x53 -> IRQ 41
> vec 0x54 -> IRQ 42
> vec 0x55 -> IRQ 43
> vec 0x56 -> IRQ 44
> vec 0x57 -> IRQ 45
> vec 0x58 -> IRQ 46
> vec 0x59 -> IRQ 47
> vec 0x5a -> IRQ 48
> vec 0x5b -> IRQ 49
> vec 0x5c -> IRQ 50
> vec 0x5d -> IRQ 51
> vec 0x5e -> IRQ 52
> vec 0x5f -> IRQ 53
> vec 0x60 -> IRQ 54
> vec 0x61 -> IRQ 55
> vec 0x62 -> IRQ 56
> vec 0x63 -> IRQ 57
> vec 0x64 -> IRQ 58
> vec 0x65 -> IRQ 59
> vec 0x66 -> IRQ 60
> vec 0x67 -> IRQ 61
> vec 0x68 -> IRQ 62
> vec 0x69 -> IRQ 63
> vec 0x6a -> IRQ 64
> vec 0x6b -> IRQ 65
> vec 0x6c -> IRQ 66
> vec 0x6d -> IRQ 67
> vec 0x6e -> IRQ 68
> vec 0x6f -> IRQ 69
> vec 0xef -> lapic timer
> Interrupts bound to lapic 6
> vec 0x30 -> IRQ 27
> vec 0x31 -> IRQ 32
> vec 0x32 -> IRQ 35
> vec 0x33 -> IRQ 4
> vec 0xef -> lapic timer
> db> show lapic
> lapic ID = 4
> version  = 1.5
> max LVT  = 6
> SVR      = ff (enabled)
> TPR      = 00
> In-service Interrupts:
> isr2: 43
> TMR Interrupts:
> IRR Interrupts:
> irr1: 32 33
> irr2: 43
> irr7: ef
> db>
> </snip>
>
> Regards
>
> John
>
>