[Bug 293830] ahci: AMD SB7x0/SB8x0/SB9x0 unstable with MSI enabled (0x43911002)

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 15 Mar 2026 10:31:50 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=293830

            Bug ID: 293830
           Summary: ahci: AMD SB7x0/SB8x0/SB9x0 unstable with MSI enabled
                    (0x43911002)
           Product: Base System
           Version: 14.4-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: vadlerg@freemail.hu

Created attachment 268818
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=268818&action=edit
patch for ahci: disable MSI for AMD SB7x0/SB8x0/SB9x0 (0x43911002) without
disabling PMP

Hardware:
AMD SB7x0/SB8x0/SB9x0 AHCI controller
PCI ID: 0x43911002

Problem:
Disk drops offline under load when MSI interrupt mode is used.

Observation:
Switching quirk from AHCI_Q_1MSI to AHCI_Q_NOMSI fixes the problem.

Test result:
System stable after kernel rebuild and heavy disk load.

Patch attached.

I have a HP N40l server running on FreeBSD since ages.
I had problem with dropping AHCI devices but solved it some times ago by adding
hw.pci.enable_msi="0"
to loader.conf and desabling every PCI MSI with it.

I forgot about the problem in the passing years until recently I updated to
14.4 and reviewed the system file settings and removed the ominous PCI MSI
disable line.

My system begun to produce pool dropouts like:
Mar 11 18:58:05 ZFSguru kernel: ada4 at ahcich4 bus 0 scbus5 target 0 lun 0
Mar 11 18:58:05 ZFSguru kernel: ada4: <ST16000VE000-2L2103 EV02> s/n ZL29XB7L
detached
Mar 11 18:58:22 ZFSguru kernel: Solaris: WARNING: Pool 'DOWN' has encountered
an uncorrectable I/O failure and has been suspended.
Mar 11 18:58:22 ZFSguru kernel:
Mar 11 18:58:22 ZFSguru ZFS[16228]: pool I/O failure, zpool=DOWN error=6
Mar 11 18:58:22 ZFSguru ZFS[16232]: catastrophic pool I/O failure, zpool=DOWN

First  forgot about the removed line and did not found the culprit. The SMART
values and everything was OK but the pools failed under stress despite adding
hint.ahci.0.msi="0"
hint.ahci.0.ccc="0"
hint.ahcich.4.sata_rev="2"
to device hints.
Finally I remembered and put back the hw.pci.enable_msi="0" line to loader.conf
and the problem is solved again.
I've investigated further and found a patch for the ahci driver from 2018 which
did not make it yet to the main codebase. It disables MSI and PMP (port
multiplicator) functions for the chipset. Since I do not have any problem with
port multiplication made a test with a kernel disabling only MSI and voila, the
pools are working without dropout, without additional loader.conf or
device.hints lines.

The chipset I talking about:
 pciconf -lvbc | egrep -A4 -B2 'class=0x010601|AHCI|SATA'
    ecap 000b[100] = Vendor [1] ID 0001 Rev 1 Length 16
    ecap 0002[110] = VC 1 max VC0
ahci0@pci0:0:17:0:      class=0x010601 rev=0x40 hdr=0x00 vendor=0x1002
device=0x4391 subvendor=0x103c subdevice=0x1609
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]'
    class      = mass storage
    subclass   = SATA
    bar   [10] = type I/O Port, range 32, base 0xc000, size 8, enabled
    bar   [14] = type I/O Port, range 32, base 0xb000, size 4, enabled
    bar   [18] = type I/O Port, range 32, base 0xa000, size 8, enabled
    bar   [1c] = type I/O Port, range 32, base 0x9000, size 4, enabled
--
    bar   [24] = type Memory, range 32, base 0xfe4ffc00, size 1024, enabled
    cap 05[50] = MSI supports 8 messages, 64 bit
    cap 12[70] = SATA Index-Data Pair
    cap 13[a4] = PCI Advanced Features: FLR TP
ohci0@pci0:0:18:0:      class=0x0c0310 rev=0x00 hdr=0x00 vendor=0x1002
device=0x4397 subvendor=0x103c subdevice=0x1609
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 USB OHCI0 Controller'


The patch insert NOMSI instead of 1MSI: 
sed -i '' -e '/{0x43911002, 0x00, "AMD SB7x0\/SB8x0\/SB9x0",/{
n
s/AHCI_Q_ATI_PMP_BUG | AHCI_Q_1MSI/AHCI_Q_NOMSI | AHCI_Q_ATI_PMP_BUG/
}' sys/dev/ahci/ahci_pci.c


diff --git a/sys/dev/ahci/ahci_pci.c b/sys/dev/ahci/ahci_pci.c
@@
 {0x43911002, 0x00, "AMD SB7x0/SB8x0/SB9x0",
-    AHCI_Q_ATI_PMP_BUG | AHCI_Q_1MSI},
+    AHCI_Q_NOMSI | AHCI_Q_ATI_PMP_BUG},

-- 
You are receiving this mail because:
You are the assignee for the bug.