Re: NVMe (U.2) hot-swap support status?

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Fri, 16 May 2025 11:07:24 UTC
On 16/05/2025 09:51, Gerrit Kühn wrote:
> Am Fri, 9 May 2025 11:57:32 -0600
> schrieb Warner Losh <imp@bsdimp.com>:
> 
>> I've had access to a couple of hotplug chassis / motherboards. For
>> x86, they've just worked for me.  While the controller is bundled onto
>> the nvme card, the PCIe bus has protocols to cope with a card being
>> removed. FreeBSD has support for the hotplug standards around this.
>>
>> I've had some dodgy firmware on arm64 systems fail, though. I've not
>> had the time to puzzle ou why....
> 
> I have a SuperMicro server here with 8x U.2 drives to play with. I
> installed 14.2, set up a zpool on the 8 drives and then pulled one disk.
> Unfortunately, after replugging the hotswap frame, the drives does not
> come back. Instead I get

FWIW, a few years back I helped a customer who had a problem with U.2 hot-plug.
In that case almost everything worked just fine.
The system was a SuperServer 1029U-TN12RV.
There was a problem that hot-plug did not work for one particular port.
We traced it to a firmware issue, an off-by-one error in ACPI DSDT.

My recollection is vague now, but I think that the problem did not affect other 
operating systems because they used ACPI-assisted PCIe hot-plug while FreeBSD 
supported only native PCIe hot-plug and the bug was specific to the latter.
> May 16 08:31:38 cliff2 ZFS[23858]: vdev state changed, pool_guid=144841244017009
> 40093 vdev_guid=12606705151959523555
> May 16 08:31:38 cliff2 ZFS[26525]: vdev is removed, pool_guid=14484124401700940093 vdev_guid=12606705151959523555
> May 16 08:31:38 cliff2 kernel: nvme3: detached
> May 16 08:31:38 cliff2 kernel: pci4: detached
> May 16 08:31:38 cliff2 kernel: pcib4: Timed out waiting for Data Link Layer Active
> May 16 08:33:38 cliff2 kernel: ahciem0: Unsupported enclosure interface
> May 16 08:33:38 cliff2 kernel: (aprobe0:ahciem0:0:0:0): SEP_ATTN IDENTIFY. ACB: 67 ec 02 00 00 40 00 00 00 00 80 00
> May 16 08:33:38 cliff2 kernel: (aprobe0:ahciem0:0:0:0): CAM status: CCB request was invalid
> May 16 08:33:38 cliff2 kernel: (aprobe0:ahciem0:0:0:0): Error 22, Unretryable error
> May 16 08:33:38 cliff2 kernel: ahciem1: Unsupported enclosure interface
> May 16 08:33:38 cliff2 kernel: (aprobe0:ahciem1:0:0:0): SEP_ATTN IDENTIFY. ACB: 67 ec 02 00 00 40 00 00 00 00 80 00
> May 16 08:33:38 cliff2 kernel: (aprobe0:ahciem1:0:0:0): CAM status: CCB request was invalid
> May 16 08:33:38 cliff2 kernel: (aprobe0:ahciem1:0:0:0): Error 22,
> Unretryable error
> ---
> 
> 
> After rebooting, the drive is back.
> 
> Any ideas how to make hotplugging work?
> 
> Hardware:
> Mainboard: H12SSL-NT
> CPU: AMD EPYC 7313P
> U.2 Drives: SAMSUNG MZQLB1T9HAJR
> Controller: AOC-SLG4-4E4T-O

-- 
Andriy Gapon