From nobody Mon Nov 10 15:51:28 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4d4vKX6fd7z6Gt9N; Mon, 10 Nov 2025 15:51:28 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R12" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4d4vKX3vMCz43xF; Mon, 10 Nov 2025 15:51:28 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1762789888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DF3ACX4KoVtHMWOTCc8wsmS5VGBmqvx4zH/w10NOl98=; b=y0RsJp1+8cwLYkH6csMMir+MzHN42m1lyTLdhVs6BLeo+ljjKsTXV9Ve1wDr9s+HsKFa0H 3hnCuAc8Hz6Gz2fDFSUbqprwXOmgVxGfkLPwZCkcVWBELDVxyUl75BEc0haZf5kEYME774 WHq7zyIYzMFChqm3ZS9vg9aWu8EiveeeUhKMZN9fbKjBIrYtfO+ys2/O9R6hxIw088Ssx/ VvjEFdwss3Kmrwho7ff5oVkFBsc/PHsNllIzcypPzQR16EBWp3cB9WiFufVP70YuC3noWb DNZNkHKXg4Gdf95M3DoT9Lw0G+tYIJmPPQQpl1RUjvrQ7ghEkKETSaujtbeZlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1762789888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DF3ACX4KoVtHMWOTCc8wsmS5VGBmqvx4zH/w10NOl98=; b=jU9s2weCLE1fra2cn6/t45I0cfLM23SCJ4XL+d0iqwNb/UME7aMj5X4kEi8a/JPStWGDoh x8XM/OwldVDIpx5LsVMvJFBPS5AcNiBI525l3RriKy8xOdAp9LIN3WE96GRBGM9OpO2L/v HfwMHmo9dYOgGvQs3b1b+cU/nDvov3gHdIuLYgRzXec/qhDCcscatAGRxFLXhNHBdmy+0f 3ZYk+2S5+DDyJ6bEQksB9cjqjW7uErhQiii3sM1IlitmcxDyKsEg1UiVb9Y2qd9SjIwvsx XlSvZ3qbjYDyKDrsrjJW62gqqLSxnCbElZji8m3VVpvteh09m0JuSNNmwdPvtA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1762789888; a=rsa-sha256; cv=none; b=cvUmxH6Y0MvRwFl/3wPQITZBT7IsHE1PEs17OoY0gZTbTCs+i4TxpC2vORAcYy+3rULgPo 5n6Ck3EUSA3H6z6GpHHUYJgpWv7bIxCmECA/RFlIMU79jrHOZc216K1queQoRM6aJWC0v5 h/bWRIhfVlDeMcKd9Yp46moRgVM7Pj70EmvxMYAsD8hBmheeqJgWEZC39g0WB99k3nWHuy 8XaTGasyCNogItMn9SJ3M6jRVp7VguuBFD70k5elEsBK8vCYzka9yqPQwb6gL33GXE0684 qGYYNYNS/lTWKWXnuPKinw1Y6xOnIXYWtB1jCSu/piYI1D3vmknEOYH888XSTA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4d4vKX3Mknz87m; Mon, 10 Nov 2025 15:51:28 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 5AAFpSUX048188; Mon, 10 Nov 2025 15:51:28 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 5AAFpSw6048185; Mon, 10 Nov 2025 15:51:28 GMT (envelope-from git) Date: Mon, 10 Nov 2025 15:51:28 GMT Message-Id: <202511101551.5AAFpSw6048185@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: John Baldwin Subject: git: fd3581e4ddde - main - cxgbe tom: Add support for sending NVMe/TCP PDUs List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: jhb X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: fd3581e4ddde4166d6ba6ce264a63833ded35483 Auto-Submitted: auto-generated The branch main has been updated by jhb: URL: https://cgit.FreeBSD.org/src/commit/?id=fd3581e4ddde4166d6ba6ce264a63833ded35483 commit fd3581e4ddde4166d6ba6ce264a63833ded35483 Author: John Baldwin AuthorDate: 2025-11-10 15:50:48 +0000 Commit: John Baldwin CommitDate: 2025-11-10 15:50:48 +0000 cxgbe tom: Add support for sending NVMe/TCP PDUs These work requests are largely similar to the same work requests for iSCSI offload but use a newer tx_data work request structure. This includes initial support for ISO where a large C2H_DATA or H2C_DATA PDU is split into multiple PDUs on the wire. Sponsored by: Chelsio Communications --- sys/dev/cxgbe/adapter.h | 3 + sys/dev/cxgbe/t4_main.c | 3 + sys/dev/cxgbe/t4_sge.c | 15 ++ sys/dev/cxgbe/tom/t4_cpl_io.c | 344 ++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 356 insertions(+), 9 deletions(-) diff --git a/sys/dev/cxgbe/adapter.h b/sys/dev/cxgbe/adapter.h index 55f09fefb7e3..57618b1ceb15 100644 --- a/sys/dev/cxgbe/adapter.h +++ b/sys/dev/cxgbe/adapter.h @@ -795,6 +795,9 @@ struct sge_ofld_txq { counter_u64_t tx_iscsi_pdus; counter_u64_t tx_iscsi_octets; counter_u64_t tx_iscsi_iso_wrs; + counter_u64_t tx_nvme_pdus; + counter_u64_t tx_nvme_octets; + counter_u64_t tx_nvme_iso_wrs; counter_u64_t tx_aio_jobs; counter_u64_t tx_aio_octets; counter_u64_t tx_toe_tls_records; diff --git a/sys/dev/cxgbe/t4_main.c b/sys/dev/cxgbe/t4_main.c index 4d9c94789a37..2f6eb6062c20 100644 --- a/sys/dev/cxgbe/t4_main.c +++ b/sys/dev/cxgbe/t4_main.c @@ -12987,6 +12987,9 @@ clear_stats(struct adapter *sc, u_int port_id) counter_u64_zero(ofld_txq->tx_iscsi_pdus); counter_u64_zero(ofld_txq->tx_iscsi_octets); counter_u64_zero(ofld_txq->tx_iscsi_iso_wrs); + counter_u64_zero(ofld_txq->tx_nvme_pdus); + counter_u64_zero(ofld_txq->tx_nvme_octets); + counter_u64_zero(ofld_txq->tx_nvme_iso_wrs); counter_u64_zero(ofld_txq->tx_aio_jobs); counter_u64_zero(ofld_txq->tx_aio_octets); counter_u64_zero(ofld_txq->tx_toe_tls_records); diff --git a/sys/dev/cxgbe/t4_sge.c b/sys/dev/cxgbe/t4_sge.c index 2f9cb1a4ebb5..8cd5b0543d8a 100644 --- a/sys/dev/cxgbe/t4_sge.c +++ b/sys/dev/cxgbe/t4_sge.c @@ -4957,6 +4957,9 @@ alloc_ofld_txq(struct vi_info *vi, struct sge_ofld_txq *ofld_txq, int idx) ofld_txq->tx_iscsi_pdus = counter_u64_alloc(M_WAITOK); ofld_txq->tx_iscsi_octets = counter_u64_alloc(M_WAITOK); ofld_txq->tx_iscsi_iso_wrs = counter_u64_alloc(M_WAITOK); + ofld_txq->tx_nvme_pdus = counter_u64_alloc(M_WAITOK); + ofld_txq->tx_nvme_octets = counter_u64_alloc(M_WAITOK); + ofld_txq->tx_nvme_iso_wrs = counter_u64_alloc(M_WAITOK); ofld_txq->tx_aio_jobs = counter_u64_alloc(M_WAITOK); ofld_txq->tx_aio_octets = counter_u64_alloc(M_WAITOK); ofld_txq->tx_toe_tls_records = counter_u64_alloc(M_WAITOK); @@ -5000,6 +5003,9 @@ free_ofld_txq(struct vi_info *vi, struct sge_ofld_txq *ofld_txq) counter_u64_free(ofld_txq->tx_iscsi_pdus); counter_u64_free(ofld_txq->tx_iscsi_octets); counter_u64_free(ofld_txq->tx_iscsi_iso_wrs); + counter_u64_free(ofld_txq->tx_nvme_pdus); + counter_u64_free(ofld_txq->tx_nvme_octets); + counter_u64_free(ofld_txq->tx_nvme_iso_wrs); counter_u64_free(ofld_txq->tx_aio_jobs); counter_u64_free(ofld_txq->tx_aio_octets); counter_u64_free(ofld_txq->tx_toe_tls_records); @@ -5029,6 +5035,15 @@ add_ofld_txq_sysctls(struct sysctl_ctx_list *ctx, struct sysctl_oid *oid, SYSCTL_ADD_COUNTER_U64(ctx, children, OID_AUTO, "tx_iscsi_iso_wrs", CTLFLAG_RD, &ofld_txq->tx_iscsi_iso_wrs, "# of iSCSI segmentation offload work requests"); + SYSCTL_ADD_COUNTER_U64(ctx, children, OID_AUTO, "tx_nvme_pdus", + CTLFLAG_RD, &ofld_txq->tx_nvme_pdus, + "# of NVMe PDUs transmitted"); + SYSCTL_ADD_COUNTER_U64(ctx, children, OID_AUTO, "tx_nvme_octets", + CTLFLAG_RD, &ofld_txq->tx_nvme_octets, + "# of payload octets in transmitted NVMe PDUs"); + SYSCTL_ADD_COUNTER_U64(ctx, children, OID_AUTO, "tx_nvme_iso_wrs", + CTLFLAG_RD, &ofld_txq->tx_nvme_iso_wrs, + "# of NVMe segmentation offload work requests"); SYSCTL_ADD_COUNTER_U64(ctx, children, OID_AUTO, "tx_aio_jobs", CTLFLAG_RD, &ofld_txq->tx_aio_jobs, "# of zero-copy aio_write(2) jobs transmitted"); diff --git a/sys/dev/cxgbe/tom/t4_cpl_io.c b/sys/dev/cxgbe/tom/t4_cpl_io.c index 84e31efa8b58..10f4f91f8db6 100644 --- a/sys/dev/cxgbe/tom/t4_cpl_io.c +++ b/sys/dev/cxgbe/tom/t4_cpl_io.c @@ -66,6 +66,7 @@ #include #include +#include #include "common/common.h" #include "common/t4_msg.h" @@ -495,6 +496,9 @@ t4_close_conn(struct adapter *sc, struct toepcb *toep) #define MIN_ISO_TX_CREDITS (howmany(sizeof(struct cpl_tx_data_iso), 16)) #define MIN_TX_CREDITS(iso) \ (MIN_OFLD_TX_CREDITS + ((iso) ? MIN_ISO_TX_CREDITS : 0)) +#define MIN_OFLD_TX_V2_CREDITS (howmany(sizeof(struct fw_ofld_tx_data_v2_wr) + 1, 16)) +#define MIN_TX_V2_CREDITS(iso) \ + (MIN_OFLD_TX_V2_CREDITS + ((iso) ? MIN_ISO_TX_CREDITS : 0)) _Static_assert(MAX_OFLD_TX_CREDITS <= MAX_OFLD_TX_SDESC_CREDITS, "MAX_OFLD_TX_SDESC_CREDITS too small"); @@ -542,6 +546,46 @@ max_dsgl_nsegs(int tx_credits, int iso) return (nseg); } +/* Maximum amount of immediate data we could stuff in a WR */ +static inline int +max_imm_payload_v2(int tx_credits, int iso) +{ + const int iso_cpl_size = iso ? sizeof(struct cpl_tx_data_iso) : 0; + + KASSERT(tx_credits >= 0 && + tx_credits <= MAX_OFLD_TX_CREDITS, + ("%s: %d credits", __func__, tx_credits)); + + if (tx_credits < MIN_TX_V2_CREDITS(iso)) + return (0); + + return (tx_credits * 16 - sizeof(struct fw_ofld_tx_data_v2_wr) - + iso_cpl_size); +} + +/* Maximum number of SGL entries we could stuff in a WR */ +static inline int +max_dsgl_nsegs_v2(int tx_credits, int iso, int imm_payload) +{ + int nseg = 1; /* ulptx_sgl has room for 1, rest ulp_tx_sge_pair */ + int sge_pair_credits = tx_credits - MIN_TX_V2_CREDITS(iso); + + KASSERT(tx_credits >= 0 && + tx_credits <= MAX_OFLD_TX_CREDITS, + ("%s: %d credits", __func__, tx_credits)); + + if (tx_credits < MIN_TX_V2_CREDITS(iso) || + sge_pair_credits <= howmany(imm_payload, 16)) + return (0); + sge_pair_credits -= howmany(imm_payload, 16); + + nseg += 2 * (sge_pair_credits * 16 / 24); + if ((sge_pair_credits * 16) % 24 == 16) + nseg++; + + return (nseg); +} + static inline void write_tx_wr(void *dst, struct toepcb *toep, int fw_wr_opcode, unsigned int immdlen, unsigned int plen, uint8_t credits, int shove, @@ -569,6 +613,35 @@ write_tx_wr(void *dst, struct toepcb *toep, int fw_wr_opcode, } } +static inline void +write_tx_v2_wr(void *dst, struct toepcb *toep, int fw_wr_opcode, + unsigned int immdlen, unsigned int plen, uint8_t credits, int shove, + int ulp_submode) +{ + struct fw_ofld_tx_data_v2_wr *txwr = dst; + uint32_t flags; + + memset(txwr, 0, sizeof(*txwr)); + txwr->op_to_immdlen = htobe32(V_WR_OP(fw_wr_opcode) | + V_FW_WR_IMMDLEN(immdlen)); + txwr->flowid_len16 = htobe32(V_FW_WR_FLOWID(toep->tid) | + V_FW_WR_LEN16(credits)); + txwr->plen = htobe32(plen); + flags = V_TX_ULP_MODE(ULP_MODE_NVMET) | V_TX_ULP_SUBMODE(ulp_submode) | + V_TX_URG(0) | V_TX_SHOVE(shove); + + if (toep->params.tx_align > 0) { + if (plen < 2 * toep->params.emss) + flags |= F_FW_OFLD_TX_DATA_WR_LSODISABLE; + else + flags |= F_FW_OFLD_TX_DATA_WR_ALIGNPLD | + (toep->params.nagle == 0 ? 0 : + F_FW_OFLD_TX_DATA_WR_ALIGNPLDSHOVE); + } + + txwr->lsodisable_to_flags = htobe32(flags); +} + /* * Generate a DSGL from a starting mbuf. The total number of segments and the * maximum segments in any one mbuf are provided. @@ -982,8 +1055,8 @@ rqdrop_locked(struct mbufq *q, int plen) #define ULP_ISO G_TX_ULP_SUBMODE(F_FW_ISCSI_TX_DATA_WR_ULPSUBMODE_ISO) static void -write_tx_data_iso(void *dst, u_int ulp_submode, uint8_t flags, uint16_t mss, - int len, int npdu) +write_iscsi_tx_data_iso(void *dst, u_int ulp_submode, uint8_t flags, + uint16_t mss, int len, int npdu) { struct cpl_tx_data_iso *cpl; unsigned int burst_size; @@ -1147,7 +1220,7 @@ write_iscsi_mbuf_wr(struct toepcb *toep, struct mbuf *sndptr) adjusted_plen, credits, shove, ulp_submode | ULP_ISO); cpl_iso = (struct cpl_tx_data_iso *)(txwr + 1); MPASS(plen == sndptr->m_pkthdr.len); - write_tx_data_iso(cpl_iso, ulp_submode, + write_iscsi_tx_data_iso(cpl_iso, ulp_submode, mbuf_iscsi_iso_flags(sndptr), iso_mss, plen, npdu); p = cpl_iso + 1; } else { @@ -1183,21 +1256,269 @@ write_iscsi_mbuf_wr(struct toepcb *toep, struct mbuf *sndptr) return (wr); } +static void +write_nvme_tx_data_iso(void *dst, u_int ulp_submode, u_int iso_type, + uint16_t mss, int len, int npdu, int pdo) +{ + struct cpl_t7_tx_data_iso *cpl; + unsigned int burst_size; + + /* + * TODO: Need to figure out how the LAST_PDU and SUCCESS flags + * are handled. + * + * - Does len need padding bytes? (If so, does padding need + * to be in DSGL input?) + * + * - burst always 0? + */ + burst_size = 0; + + cpl = (struct cpl_t7_tx_data_iso *)dst; + cpl->op_to_scsi = htonl(V_CPL_T7_TX_DATA_ISO_OPCODE(CPL_TX_DATA_ISO) | + V_CPL_T7_TX_DATA_ISO_FIRST(1) | + V_CPL_T7_TX_DATA_ISO_LAST(1) | + V_CPL_T7_TX_DATA_ISO_CPLHDRLEN(0) | + V_CPL_T7_TX_DATA_ISO_HDRCRC(!!(ulp_submode & ULP_CRC_HEADER)) | + V_CPL_T7_TX_DATA_ISO_PLDCRC(!!(ulp_submode & ULP_CRC_DATA)) | + V_CPL_T7_TX_DATA_ISO_IMMEDIATE(0) | + V_CPL_T7_TX_DATA_ISO_SCSI(iso_type)); + + cpl->nvme_tcp_pkd = F_CPL_T7_TX_DATA_ISO_NVME_TCP; + cpl->ahs = 0; + cpl->mpdu = htons(DIV_ROUND_UP(mss, 4)); + cpl->burst = htonl(DIV_ROUND_UP(burst_size, 4)); + cpl->size = htonl(len); + cpl->num_pi_bytes_seglen_offset = htonl(0); + cpl->datasn_offset = htonl(0); + cpl->buffer_offset = htonl(0); + cpl->reserved3 = pdo; +} + +static struct wrqe * +write_nvme_mbuf_wr(struct toepcb *toep, struct mbuf *sndptr) +{ + struct mbuf *m; + const struct nvme_tcp_common_pdu_hdr *hdr; + struct fw_v2_nvmet_tx_data_wr *txwr; + struct cpl_tx_data_iso *cpl_iso; + void *p; + struct wrqe *wr; + u_int plen, nsegs, credits, max_imm, max_nsegs, max_nsegs_1mbuf; + u_int adjusted_plen, imm_data, ulp_submode; + struct inpcb *inp = toep->inp; + struct tcpcb *tp = intotcpcb(inp); + int tx_credits, shove, npdu, wr_len; + uint16_t iso_mss; + bool iso, nomap_mbuf_seen; + + M_ASSERTPKTHDR(sndptr); + + tx_credits = min(toep->tx_credits, MAX_OFLD_TX_CREDITS); + if (mbuf_raw_wr(sndptr)) { + plen = sndptr->m_pkthdr.len; + KASSERT(plen <= SGE_MAX_WR_LEN, + ("raw WR len %u is greater than max WR len", plen)); + if (plen > tx_credits * 16) + return (NULL); + + wr = alloc_wrqe(roundup2(plen, 16), &toep->ofld_txq->wrq); + if (__predict_false(wr == NULL)) + return (NULL); + + m_copydata(sndptr, 0, plen, wrtod(wr)); + return (wr); + } + + /* + * The first mbuf is the PDU header that is always sent as + * immediate data. + */ + imm_data = sndptr->m_len; + + iso = mbuf_iscsi_iso(sndptr); + max_imm = max_imm_payload_v2(tx_credits, iso); + + /* + * Not enough credits for the PDU header. + */ + if (imm_data > max_imm) + return (NULL); + + max_nsegs = max_dsgl_nsegs_v2(tx_credits, iso, imm_data); + iso_mss = mbuf_iscsi_iso_mss(sndptr); + + plen = imm_data; + nsegs = 0; + max_nsegs_1mbuf = 0; /* max # of SGL segments in any one mbuf */ + nomap_mbuf_seen = false; + for (m = sndptr->m_next; m != NULL; m = m->m_next) { + int n; + + if (m->m_flags & M_EXTPG) + n = sglist_count_mbuf_epg(m, mtod(m, vm_offset_t), + m->m_len); + else + n = sglist_count(mtod(m, void *), m->m_len); + + nsegs += n; + plen += m->m_len; + + /* + * This mbuf would send us _over_ the nsegs limit. + * Suspend tx because the PDU can't be sent out. + */ + if ((nomap_mbuf_seen || plen > max_imm) && nsegs > max_nsegs) + return (NULL); + + if (m->m_flags & M_EXTPG) + nomap_mbuf_seen = true; + if (max_nsegs_1mbuf < n) + max_nsegs_1mbuf = n; + } + + if (__predict_false(toep->flags & TPF_FIN_SENT)) + panic("%s: excess tx.", __func__); + + /* + * We have a PDU to send. All of it goes out in one WR so 'm' + * is NULL. A PDU's length is always a multiple of 4. + */ + MPASS(m == NULL); + MPASS((plen & 3) == 0); + MPASS(sndptr->m_pkthdr.len == plen); + + shove = !(tp->t_flags & TF_MORETOCOME); + + /* + * plen doesn't include header digests, padding, and data + * digests which are generated and inserted in the right + * places by the TOE, but they do occupy TCP sequence space + * and need to be accounted for. + * + * To determine the overhead, check the PDU header in sndptr. + * Note that only certain PDU types can use digests and + * padding, and PDO accounts for all but the data digests for + * those PDUs. + */ + MPASS((sndptr->m_flags & M_EXTPG) == 0); + ulp_submode = mbuf_ulp_submode(sndptr); + hdr = mtod(sndptr, const void *); + switch (hdr->pdu_type) { + case NVME_TCP_PDU_TYPE_H2C_TERM_REQ: + case NVME_TCP_PDU_TYPE_C2H_TERM_REQ: + MPASS(ulp_submode == 0); + MPASS(!iso); + break; + case NVME_TCP_PDU_TYPE_CAPSULE_RESP: + case NVME_TCP_PDU_TYPE_R2T: + MPASS((ulp_submode & ULP_CRC_DATA) == 0); + /* FALLTHROUGH */ + case NVME_TCP_PDU_TYPE_CAPSULE_CMD: + MPASS(!iso); + break; + case NVME_TCP_PDU_TYPE_H2C_DATA: + case NVME_TCP_PDU_TYPE_C2H_DATA: + if (le32toh(hdr->plen) + ((ulp_submode & ULP_CRC_DATA) != 0 ? + sizeof(uint32_t) : 0) == plen) + MPASS(!iso); + break; + default: + __assert_unreachable(); + } + + if (iso) { + npdu = howmany(plen - hdr->hlen, iso_mss); + adjusted_plen = hdr->pdo * npdu + (plen - hdr->hlen); + if ((ulp_submode & ULP_CRC_DATA) != 0) + adjusted_plen += npdu * sizeof(uint32_t); + } else { + npdu = 1; + adjusted_plen = le32toh(hdr->plen); + } + wr_len = sizeof(*txwr); + if (iso) + wr_len += sizeof(struct cpl_tx_data_iso); + if (plen <= max_imm && !nomap_mbuf_seen) { + /* Immediate data tx for full PDU */ + imm_data = plen; + wr_len += plen; + nsegs = 0; + } else { + /* DSGL tx for PDU data */ + wr_len += roundup2(imm_data, 16); + wr_len += sizeof(struct ulptx_sgl) + + ((3 * (nsegs - 1)) / 2 + ((nsegs - 1) & 1)) * 8; + } + + wr = alloc_wrqe(roundup2(wr_len, 16), &toep->ofld_txq->wrq); + if (wr == NULL) { + /* XXX: how will we recover from this? */ + return (NULL); + } + txwr = wrtod(wr); + credits = howmany(wr->wr_len, 16); + + if (iso) { + write_tx_v2_wr(txwr, toep, FW_V2_NVMET_TX_DATA_WR, + imm_data + sizeof(struct cpl_tx_data_iso), + adjusted_plen, credits, shove, ulp_submode | ULP_ISO); + cpl_iso = (struct cpl_tx_data_iso *)(txwr + 1); + MPASS(plen == sndptr->m_pkthdr.len); + write_nvme_tx_data_iso(cpl_iso, ulp_submode, + (hdr->pdu_type & 0x1) == 0 ? 1 : 2, iso_mss, plen, npdu, + hdr->pdo); + p = cpl_iso + 1; + } else { + write_tx_v2_wr(txwr, toep, FW_V2_NVMET_TX_DATA_WR, imm_data, + adjusted_plen, credits, shove, ulp_submode); + p = txwr + 1; + } + + /* PDU header (and immediate data payload). */ + m_copydata(sndptr, 0, imm_data, p); + if (nsegs != 0) { + p = roundup2((char *)p + imm_data, 16); + write_tx_sgl(p, sndptr->m_next, NULL, nsegs, max_nsegs_1mbuf); + if (wr_len & 0xf) { + uint64_t *pad = (uint64_t *)((uintptr_t)txwr + wr_len); + *pad = 0; + } + } + + KASSERT(toep->tx_credits >= credits, + ("%s: not enough credits: credits %u " + "toep->tx_credits %u tx_credits %u nsegs %u " + "max_nsegs %u iso %d", __func__, credits, + toep->tx_credits, tx_credits, nsegs, max_nsegs, iso)); + + tp->snd_nxt += adjusted_plen; + tp->snd_max += adjusted_plen; + + counter_u64_add(toep->ofld_txq->tx_nvme_pdus, npdu); + counter_u64_add(toep->ofld_txq->tx_nvme_octets, plen); + if (iso) + counter_u64_add(toep->ofld_txq->tx_nvme_iso_wrs, 1); + + return (wr); +} + void t4_push_pdus(struct adapter *sc, struct toepcb *toep, int drop) { struct mbuf *sndptr, *m; struct fw_wr_hdr *wrhdr; struct wrqe *wr; - u_int plen, credits; + u_int plen, credits, mode; struct inpcb *inp = toep->inp; struct ofld_tx_sdesc *txsd = &toep->txsd[toep->txsd_pidx]; struct mbufq *pduq = &toep->ulp_pduq; INP_WLOCK_ASSERT(inp); + mode = ulp_mode(toep); KASSERT(toep->flags & TPF_FLOWC_WR_SENT, ("%s: flowc_wr not sent for tid %u.", __func__, toep->tid)); - KASSERT(ulp_mode(toep) == ULP_MODE_ISCSI, + KASSERT(mode == ULP_MODE_ISCSI || mode == ULP_MODE_NVMET, ("%s: ulp_mode %u for toep %p", __func__, ulp_mode(toep), toep)); if (__predict_false(toep->flags & TPF_ABORT_SHUTDOWN)) @@ -1230,7 +1551,7 @@ t4_push_pdus(struct adapter *sc, struct toepcb *toep, int drop) if (sbu > 0) { /* * The data transmitted before the - * tid's ULP mode changed to ISCSI is + * tid's ULP mode changed to ISCSI/NVMET is * still in so_snd. Incoming credits * should account for so_snd first. */ @@ -1243,7 +1564,10 @@ t4_push_pdus(struct adapter *sc, struct toepcb *toep, int drop) } while ((sndptr = mbufq_first(pduq)) != NULL) { - wr = write_iscsi_mbuf_wr(toep, sndptr); + if (mode == ULP_MODE_ISCSI) + wr = write_iscsi_mbuf_wr(toep, sndptr); + else + wr = write_nvme_mbuf_wr(toep, sndptr); if (wr == NULL) { toep->flags |= TPF_TX_SUSPENDED; return; @@ -1302,7 +1626,8 @@ static inline void t4_push_data(struct adapter *sc, struct toepcb *toep, int drop) { - if (ulp_mode(toep) == ULP_MODE_ISCSI) + if (ulp_mode(toep) == ULP_MODE_ISCSI || + ulp_mode(toep) == ULP_MODE_NVMET) t4_push_pdus(sc, toep, drop); else if (toep->flags & TPF_KTLS) t4_push_ktls(sc, toep, drop); @@ -2008,7 +2333,8 @@ do_fw4_ack(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) SOCKBUF_LOCK(sb); sbu = sbused(sb); - if (ulp_mode(toep) == ULP_MODE_ISCSI) { + if (ulp_mode(toep) == ULP_MODE_ISCSI || + ulp_mode(toep) == ULP_MODE_NVMET) { if (__predict_false(sbu > 0)) { /* * The data transmitted before the