[Bug 277211] panic: Unhandled external data abort - handle_el1h_sync - --- exception, esr 0x96000410 - wait_fw_init - mlx5_load_one

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 22 Feb 2024 18:10:56 UTC

--- Comment #5 from John Baldwin <jhb@FreeBSD.org> ---
Ah, looks like the dmesg from Dave does actually include this patch as it has
this line of output:

mlx5_core0: translate 0x14082000000 -> 0x24082000000

That looks correct, but unfortunately, we only display the ranges in
bootverbose for FDT, not ACPI.  The patch below fixes the pcib driver to always
log the ranges which would be useful to confirm the window:

diff --git a/sys/dev/pci/pci_host_generic.c b/sys/dev/pci/pci_host_generic.c
index 386b8411d29a..46b84ff3004b 100644
--- a/sys/dev/pci/pci_host_generic.c
+++ b/sys/dev/pci/pci_host_generic.c
@@ -83,6 +83,7 @@ pci_host_generic_core_attach(device_t dev)
        uint64_t phys_base;
        uint64_t pci_base;
        uint64_t size;
+       const char *range_descr;
        char buf[64];
        int domain, error;
        int flags, rid, tuple, type;
@@ -179,6 +180,7 @@ pci_host_generic_core_attach(device_t dev)
                switch (FLAG_TYPE(sc->ranges[tuple].flags)) {
                case FLAG_TYPE_PMEM:
                        sc->has_pmem = true;
+                       range_descr = "prefetch";
                        flags = RF_PREFETCHABLE;
                        type = SYS_RES_MEMORY;
                        error = rman_manage_region(&sc->pmem_rman,
@@ -186,12 +188,14 @@ pci_host_generic_core_attach(device_t dev)
                case FLAG_TYPE_MEM:
                        flags = 0;
+                       range_descr = "memory";
                        type = SYS_RES_MEMORY;
                        error = rman_manage_region(&sc->mem_rman,
                           pci_base, pci_base + size - 1);
                case FLAG_TYPE_IO:
                        flags = 0;
+                       range_descr = "I/O port";
                        type = SYS_RES_IOPORT;
                        error = rman_manage_region(&sc->io_rman,
                           pci_base, pci_base + size - 1);
@@ -219,6 +223,10 @@ pci_host_generic_core_attach(device_t dev)
                        error = ENXIO;
                        goto err_rman_manage;
+               if (bootverbose)
+                       device_printf(dev,
+                           "PCI addr: 0x%jx, CPU addr: 0x%jx, Size: 0x%jx,
Type: %s\n",
+                           pci_base, phys_base, size, range_type);

        return (0);

That said, it seems like the translation is correct given the prefetch window
used for the pcib1 bridge between pcib0 and the mlx5 device:

pcib1: <PCI-PCI bridge> at device 0.0 on pci0
pcib1:   domain            0
pcib1:   secondary bus     1
pcib1:   subordinate bus   1
pcib1:   memory decode     0x30000000-0x301fffff
pcib1:   prefetched decode 0x14080000000-0x14083ffffff

And this allocation of mlx5's BAR:

        map[10]: type Prefetchable Memory, range 64, base 0x14082000000, size
25, enabled
pcib1: allocated prefetch range (0x14082000000-0x14083ffffff) for rid 10 of

It is odd for a register bar to be in a prefetch BAR.  It might be good to see
a verbose dmesg from before to see how the bridge and and mlx5 BAR were
configured before.

You are receiving this mail because:
You are the assignee for the bug.