[Bug 266302] [zfs][iscsi] Periodic drops by ctl with "failed to allocate soft PDU" since 13.1

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 08 Sep 2022 20:31:50 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266302

            Bug ID: 266302
           Summary: [zfs][iscsi] Periodic drops by ctl with "failed to
                    allocate soft PDU" since 13.1
           Product: Base System
           Version: 13.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: eborisch+FreeBSD@gmail.com

I have an iSCSI connection with 13.1 running on both ends; ZVOLs
provided over iSCSI for bhyve VMs on the client side. I've been
running this configuration for 5+ years on previous releases.

After upgrading the serving (ctl) side 13.0p7->13.1, I have started
getting these warnings/errors intermittently (<1-14 days between
occurrences):

Server (ctl) side: (Repeats a handful of times, all with the same timestamp.)

| Aug 23 21:08:41 <server> kernel: WARNING: icl_soft_conn_new_pdu:
failed to allocate soft PDU
| Aug 23 21:08:41 <server> kernel: WARNING: 10.0.1.2
(iqn.1994-09.org.freebsd:<client>): connection error; dropping
connection

Client (iscsid) side:

| Aug 23 21:08:42 <client> kernel: WARNING: <server> (iqn.<tgt>):
connection error; reconnecting
| Aug 23 21:09:08 <client> kernel: (da4:iscsi2:0:0:0): WRITE(6). CDB:
0a 0f be 60 08 00
| Aug 23 21:09:08 <client> kernel: (da4:iscsi2:0:0:0): CAM status:
SCSI Status Error
| Aug 23 21:09:08 <client> kernel: (da4:iscsi2:0:0:0): SCSI status:
Check Condition
| Aug 23 21:09:08 <client> kernel: (da4:iscsi2:0:0:0): SCSI sense:
UNIT ATTENTION asc:29,7 (I_T nexus loss occurred)
| Aug 23 21:09:08 <client> kernel: (da4:iscsi2:0:0:0): Retrying
command (per sense data)

While I don't see any errors occurring in the (lightly loaded) VM that
is using this device (nothing reported by the VM's kernel; forcing
filesystem checks returns nothing), I was wondering if anyone had
suggestions for what might be going on here.

The server here was updated to 13.1 first, and errors first appeared
in that state (13.0 client; 13.1 server), and have persisted in the
current (both 13.1) configuration. 10 GbE X540-AT2 on both ends.

Thanks for any suggestions!

-- 
You are receiving this mail because:
You are the assignee for the bug.