[Bug 293830] ahci: AMD SB7x0/SB8x0/SB9x0 unstable with MSI enabled (0x43911002)
Date: Sun, 15 Mar 2026 10:31:50 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=293830
Bug ID: 293830
Summary: ahci: AMD SB7x0/SB8x0/SB9x0 unstable with MSI enabled
(0x43911002)
Product: Base System
Version: 14.4-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs@FreeBSD.org
Reporter: vadlerg@freemail.hu
Created attachment 268818
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=268818&action=edit
patch for ahci: disable MSI for AMD SB7x0/SB8x0/SB9x0 (0x43911002) without
disabling PMP
Hardware:
AMD SB7x0/SB8x0/SB9x0 AHCI controller
PCI ID: 0x43911002
Problem:
Disk drops offline under load when MSI interrupt mode is used.
Observation:
Switching quirk from AHCI_Q_1MSI to AHCI_Q_NOMSI fixes the problem.
Test result:
System stable after kernel rebuild and heavy disk load.
Patch attached.
I have a HP N40l server running on FreeBSD since ages.
I had problem with dropping AHCI devices but solved it some times ago by adding
hw.pci.enable_msi="0"
to loader.conf and desabling every PCI MSI with it.
I forgot about the problem in the passing years until recently I updated to
14.4 and reviewed the system file settings and removed the ominous PCI MSI
disable line.
My system begun to produce pool dropouts like:
Mar 11 18:58:05 ZFSguru kernel: ada4 at ahcich4 bus 0 scbus5 target 0 lun 0
Mar 11 18:58:05 ZFSguru kernel: ada4: <ST16000VE000-2L2103 EV02> s/n ZL29XB7L
detached
Mar 11 18:58:22 ZFSguru kernel: Solaris: WARNING: Pool 'DOWN' has encountered
an uncorrectable I/O failure and has been suspended.
Mar 11 18:58:22 ZFSguru kernel:
Mar 11 18:58:22 ZFSguru ZFS[16228]: pool I/O failure, zpool=DOWN error=6
Mar 11 18:58:22 ZFSguru ZFS[16232]: catastrophic pool I/O failure, zpool=DOWN
First forgot about the removed line and did not found the culprit. The SMART
values and everything was OK but the pools failed under stress despite adding
hint.ahci.0.msi="0"
hint.ahci.0.ccc="0"
hint.ahcich.4.sata_rev="2"
to device hints.
Finally I remembered and put back the hw.pci.enable_msi="0" line to loader.conf
and the problem is solved again.
I've investigated further and found a patch for the ahci driver from 2018 which
did not make it yet to the main codebase. It disables MSI and PMP (port
multiplicator) functions for the chipset. Since I do not have any problem with
port multiplication made a test with a kernel disabling only MSI and voila, the
pools are working without dropout, without additional loader.conf or
device.hints lines.
The chipset I talking about:
pciconf -lvbc | egrep -A4 -B2 'class=0x010601|AHCI|SATA'
ecap 000b[100] = Vendor [1] ID 0001 Rev 1 Length 16
ecap 0002[110] = VC 1 max VC0
ahci0@pci0:0:17:0: class=0x010601 rev=0x40 hdr=0x00 vendor=0x1002
device=0x4391 subvendor=0x103c subdevice=0x1609
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]'
class = mass storage
subclass = SATA
bar [10] = type I/O Port, range 32, base 0xc000, size 8, enabled
bar [14] = type I/O Port, range 32, base 0xb000, size 4, enabled
bar [18] = type I/O Port, range 32, base 0xa000, size 8, enabled
bar [1c] = type I/O Port, range 32, base 0x9000, size 4, enabled
--
bar [24] = type Memory, range 32, base 0xfe4ffc00, size 1024, enabled
cap 05[50] = MSI supports 8 messages, 64 bit
cap 12[70] = SATA Index-Data Pair
cap 13[a4] = PCI Advanced Features: FLR TP
ohci0@pci0:0:18:0: class=0x0c0310 rev=0x00 hdr=0x00 vendor=0x1002
device=0x4397 subvendor=0x103c subdevice=0x1609
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'SB7x0/SB8x0/SB9x0 USB OHCI0 Controller'
The patch insert NOMSI instead of 1MSI:
sed -i '' -e '/{0x43911002, 0x00, "AMD SB7x0\/SB8x0\/SB9x0",/{
n
s/AHCI_Q_ATI_PMP_BUG | AHCI_Q_1MSI/AHCI_Q_NOMSI | AHCI_Q_ATI_PMP_BUG/
}' sys/dev/ahci/ahci_pci.c
diff --git a/sys/dev/ahci/ahci_pci.c b/sys/dev/ahci/ahci_pci.c
@@
{0x43911002, 0x00, "AMD SB7x0/SB8x0/SB9x0",
- AHCI_Q_ATI_PMP_BUG | AHCI_Q_1MSI},
+ AHCI_Q_NOMSI | AHCI_Q_ATI_PMP_BUG},
--
You are receiving this mail because:
You are the assignee for the bug.