[Bug 278289] nvme_opc_delete_io_sq NOT PERMITTED queue id 0

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 10 Apr 2024 09:24:46 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278289

            Bug ID: 278289
           Summary: nvme_opc_delete_io_sq NOT PERMITTED queue id 0
           Product: Base System
           Version: 14.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bhyve
          Assignee: virtualization@FreeBSD.org
          Reporter: freebsd-bugs@virtualtec.ch

We're running a Windows 2019 VM within bhyve, using the nvme drive emulation
like so:

  bhyve -c 4 -m 16G -H -w \
  -s 0,hostbridge \
  -s 4,nvme,/dev/zvol/data/volumes/zvol2 \
  -s 5,virtio-net,tap11 \
  -s 7,virtio-net,tap21 \
  -s 6,nvme,/dev/zvol/data/volumes/zvol-bk01.r \
  -s 29,fbuf,tcp=0.0.0.0:5901 \
  -s 30,xhci,tablet \
  -s 31,lpc \
  -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
  winserv2 &

the VM is used to implement a veeam backup repository with ReFS on the 2nd
disk. We just had
an incident, that this VM took the ReFS volume offline due to these events:

stornvme: Reset to device, \Device\RaidPort1, was issued
Disk: An error was detected on device\Device\Harddisk1\DR1 during a paging
operation
stornvme: the driver detected a controller error on \Device\RaidPort1
Disk: An error was detected on device\Device\Harddisk1\DR1 during a paging
operation
ReFS: The file system was unable to write metadata to the media backing volume
R:. A write failed with status "The specified request is not a valid operation
for the target device." ReFS will take the volume offline. It may be mounted
again automatically.

On the freebsd side, I have error messages like these:
daemon[70039]: nvme_opc_delete_io_sq NOT PERMITTED queue id 0 / num_squeues 4
syslogd: last message repeated 5 times

Checking the source, a queue_id of 0 is invalid, so why would Windows attempt
this? Could this
be a consequence of issuing a "Reset device" to the nvme controller, and if so,
is there anything
the bhyve drive could do to recover from this without failing the request like
it does at the moment?

Note that this system is rather under powered for the task, so timeouts are to
be expected.

-- 
You are receiving this mail because:
You are the assignee for the bug.