[Bug 262969] NVMe - Resetting controller due to a timeout and possible hot unplug

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 03 Oct 2022 16:11:14 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262969

--- Comment #7 from Timothy Guo <firemeteor@users.sourceforge.net> ---
(In reply to Timothy Guo from comment #5)
I'm back reporting that the problem comes back just now (2 hours ago according
to the alert mail) for no obvious reason -- I'm not actively using that server
at this moment. The same nvme controller timeout reset shows up in kernel log
and I loss access to the ZFS pool on it.

The disk itself seems to work well physically so I'm not sure if I can ask for
refund or any service. On the other hand, once the problem shows up, it appears
to affect both Linux && FreeBSD running on the same physical box. Maybe
firmware bug, maybe driver issue, maybe both...

I used to suspect the problem is APST related, but I have no way to play with
this config in FreeBSD. There is no mention of this term in the FreeBSD world.
There is no user land tools that can manipulate or inspect the status of the
APST related feature setting. It's kind of surprising since this feature had
bad fame in the Linux world.

Is there anybody who can help me at least do some diagnose on this problem? Is
it feasible to manually parse the PCI config space to determine the APST
status? I'll need some guide for this though...

-- 
You are receiving this mail because:
You are the assignee for the bug.