[Bug 285993] nvme device breakage in 14.2 STABLE n270867-25df691800f0

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 18 Apr 2025 18:40:24 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=285993

--- Comment #22 from John Baldwin <jhb@FreeBSD.org> ---
Hmm, so two things stand out to me:

1) We were able to clear the PFD bit (0x02):

pcib10: initial SLOT_STA 0x15a
pcib10: cleared SLOT_STA 0x40

yet the device set it again later:

pcib10: Power Fault Detected
pcib10: (3) PCIEM_SLOT_STA_PFD
pcib10: (3) PCIEM_SLOT_STA_PFD
pcib10: card not inserted

It was also set for pcib11 initially:

pcib11: initial SLOT_STA 0x12
pcib11: cleared SLOT_STA 0

2) The slot capabilities register (the one read at offset 0xa4) does not
include support for a power controller:

#define PCIEM_SLOT_CAP_PCP              0x00000002

is not set in 0x02d80078.

Without a power controller present, there shouldn't be power fault events by my
reading of the PCI-e spec.  That suggests that this might fix your system:

diff --git a/sys/dev/pci/pci_pci.c b/sys/dev/pci/pci_pci.c
index 5e71a376604b..10de719e020d 100644
--- a/sys/dev/pci/pci_pci.c
+++ b/sys/dev/pci/pci_pci.c
@@ -930,7 +930,8 @@ pcib_hotplug_inserted(struct pcib_softc *sc)
                return (false);

        /* A power fault implicitly turns off power to the slot. */
-       if (sc->pcie_slot_sta & PCIEM_SLOT_STA_PFD)
+       if (sc->pcie_slot_cap & PCIEM_SLOT_CAP_PCP &&
+           sc->pcie_slot_sta & PCIEM_SLOT_STA_PFD)
                return (false);

        /* If the MRL is disengaged, the slot is powered off. */

-- 
You are receiving this mail because:
You are the assignee for the bug.