Re: nvme timeout issues with hardware and bhyve vm's

From: Pete Wright <pete_at_nomadlogic.org>
Date: Thu, 07 Dec 2023 22:38:37 UTC

On 10/13/23 7:34 PM, Warner Losh wrote:
> 

> 
>     the messages i posted in the start of the thread are from the VM itself
>     (13.2-RELEASE).  The zpool on the hypervisor (13.2-RELEASE) showed no
>     such issues.
> 
>     Based on your comment about the improvements in 14 I'll focus my
>     efforts
>     on my workstation, it seemed to happen regularly so hopefully i can
>     find
>     a repo case.
> 
> 
> Let me now if you see similar messages in stable/14. I think I've fixed 
> all the
> issues with timeouts, though you shouldn't ever seem them in a vm setup
> unless something else weird is going on.
> 


Hi Warner, just resurfacing this thread because I've had a few lockups 
on my workstation running 14.0-STABLE.  I was able to capture a photo of 
the hang and this seems to be the most important line:

nvme0: Resetting controller due to a timeout and possible hot unplug.

When I scan the device after reboot I don't see any errors, but if there 
is a particular thing I should check via nvmecontrol please let me know. 
  Also, since it mentions possible hot unplug I wonder if this is 
hardware/firmware related to my system?

Anyway, haven't found a repro case yet but it has locked up a few times 
the past two weeks.

-pete


-- 
Pete Wright
pete@nomadlogic.org