[Bug 271990] IRQ mapping table is full after stress devctl disable/enable
Date: Wed, 14 Jun 2023 11:08:54 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271990
Bug ID: 271990
Summary: IRQ mapping table is full after stress devctl
disable/enable
Product: Base System
Version: CURRENT
Hardware: arm64
OS: Any
Status: New
Severity: Affects Many People
Priority: ---
Component: arm
Assignee: freebsd-arm@FreeBSD.org
Reporter: osamaabb@amazon.com
Reproduction steps:
-------------------
1. Create an AWS EC2 instance from one of the following AMIs in us-east-1
1.1: ami-0b55af91f40cd29ee - FreeBSD 14.0-CURRENT-arm64-20230525 UEFI
1.2: ami-0fdc715f878897386 - FreeBSD 13.2-STABLE-arm64-20230601 UEFI
1.3: ami-0e1fd0c2493efe1d1 - FreeBSD 12.4-STABLE-arm64-2023-06-01
2. run the following reset loop script:
#!/bin/sh
while true
do
devctl disable ena0
devctl enable ena0
done
Result:
-------
Crashes every time. 100% reproducible.
***The same test does not fail on intel based instances.***
Stack trace:
------------
2023-06-14T08:05:02.374Z panic: IRQ mapping table is full.
2023-06-14T08:05:02.374Z cpuid = 18
2023-06-14T08:05:02.374Z time = 1686729902
2023-06-14T08:05:02.374Z KDB: stack backtrace:
2023-06-14T08:05:02.374Z db_trace_self() at db_trace_self
2023-06-14T08:05:02.374Z db_trace_self_wrapper() at
db_trace_self_wrapper+0x30
2023-06-14T08:05:02.374Z vpanic() at vpanic+0x13c
2023-06-14T08:05:02.374Z panic() at panic+0x44
2023-06-14T08:05:02.374Z intr_map_irq() at intr_map_irq+0xb0
2023-06-14T08:05:02.374Z intr_alloc_msix() at
intr_alloc_msix+0x1d8
2023-06-14T08:05:02.374Z generic_pcie_acpi_alloc_msix() at
generic_pcie_acpi_alloc_msix+0x78
2023-06-14T08:05:02.374Z pci_alloc_msix_method() at
pci_alloc_msix_method+0x168
2023-06-14T08:05:02.374Z
ena_enable_msix_and_set_admin_interrupts() at
ena_enable_msix_and_set_admin_interrupts+0x10c
2023-06-14T08:05:02.374Z ena_attach() at ena_attach+0x65c
2023-06-14T08:05:02.375Z device_attach() at device_attach+0x3f8
2023-06-14T08:05:02.375Z device_probe_and_attach() at
device_probe_and_attach+0x7c
2023-06-14T08:05:02.375Z devctl2_ioctl() at devctl2_ioctl+0x44c
2023-06-14T08:05:02.375Z devfs_ioctl() at devfs_ioctl+0xd4
2023-06-14T08:05:02.375Z vn_ioctl() at vn_ioctl+0xc0
2023-06-14T08:05:02.375Z devfs_ioctl_f() at devfs_ioctl_f+0x20
2023-06-14T08:05:02.375Z kern_ioctl() at kern_ioctl+0x2dc
2023-06-14T08:05:02.375Z sys_ioctl() at sys_ioctl+0x118
2023-06-14T08:05:02.375Z do_el0_sync() at do_el0_sync+0x520
2023-06-14T08:05:02.375Z handle_el0_sync() at
handle_el0_sync+0x44
2023-06-14T08:05:02.375Z --- exception, esr 0x56000000
2023-06-14T08:05:02.375Z Uptime: 4m1s
2023-06-14T08:05:02.375Z Dumping 2053 out of 64453
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
2023-06-14T08:10:36.676Z Dump complete
2023-06-14T08:10:37.976Z UEFI firmware (version built at
09:00:00 on Nov 1 2018)
2023-06-14T08:10:38.076Z
[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[0m[35m[40m[2J[01;01H[2J[01;01H[0m[37m[40m[01;01HConsoles:
EFI console
2023-06-14T08:10:38.076Z Reading loader env vars from
/efi/freebsd/loader.env
2023-06-14T08:10:38.076Z Setting currdev to disk0p1:
2023-06-14T08:10:38.076Z FreeBSD/arm64 EFI loader, Revision 1.1
2023-06-14T08:10:38.076Z (Thu May 25 06:36:21 UTC 2023
root@releng1.nyi.freebsd.org)
2023-06-14T08:10:38.076Z
2023-06-14T08:10:38.076Z Command line arguments: loader.efi
2023-06-14T08:10:38.176Z Image base: 0x7856f000
2023-06-14T08:10:38.176Z EFI version: 2.70
2023-06-14T08:10:38.176Z EFI Firmware: EDK II (rev 1.00)
2023-06-14T08:10:38.176Z Console: efi (0x1000)
2023-06-14T08:10:38.176Z Load Path: \EFI\BOOT\BOOTAA64.EFI
2023-06-14T08:10:38.176Z Load Device:
PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,B61C1E65-FAFA-11ED-84CB-002590EC5BF2,0x3,0x10418)
2023-06-14T08:10:38.176Z BootCurrent: 0001
Initial investigation results:
------------------------------
Tried to reproduce the issue on Intel based instances, no reproduction even
after 50k up/down iteration.
Looked into the fbsd ena driver [1] up/down flows, saw that the driver does the
pci_msix_allocate/release and bus_allocation/release in the correct order.
[1] https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena
Since the pci/bus APIs should be platform agnostic (?) I assume it to be an
issue with ARM side of the kernel
--
You are receiving this mail because:
You are the assignee for the bug.