From nobody Thu Feb 24 12:55:43 2022 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1D0A719DCF65; Thu, 24 Feb 2022 12:55:44 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4K4CZ80CrKz4SLg; Thu, 24 Feb 2022 12:55:44 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1645707344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Y0u25W/S7WDXb0aCtV/5fDb+nQ01lF3mNZ3I6caVS5o=; b=gVwU5srV/kjpMh04aHW6dF0HMDcHZSwrNmSEqHqvqkJM/q/UzRzjByxNi04+h6nvxHNAdJ dTtAVV6Q7yCAWnuSeKG7TN/DrtW83Z+RsxFUpZTdoJgk/iNaZct2Zmd91LD+OCJLm71qpn lGBg2qs3x90+r2lLxbDV9MHOOJwPMRjM/5vzlfhp2xvch9M9fKvlnUHWKRryDb2UKIp1qB WS+EO+osQ5QGIQ+zzUDACIxjYFfnA0339Hvaa77gTM1pCwsgVkY00SpVzYdEXc0t+fE+jS +rWXrhyvyC25mferPBFOAkfOtuZ7NFApjiL60wBm/MTXML1y3cb8zLCRsS812A== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id D73324040; Thu, 24 Feb 2022 12:55:43 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 21OCth4T029968; Thu, 24 Feb 2022 12:55:43 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 21OCthgZ029967; Thu, 24 Feb 2022 12:55:43 GMT (envelope-from git) Date: Thu, 24 Feb 2022 12:55:43 GMT Message-Id: <202202241255.21OCthgZ029967@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Marcin Wojtas Subject: git: 5922f5218fcf - stable/13 - ena: merge ena-com v2.5.0 upgrade List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: mw X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: 5922f5218fcf04af27d41a87357c355a728c9e2a Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1645707344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Y0u25W/S7WDXb0aCtV/5fDb+nQ01lF3mNZ3I6caVS5o=; b=KYC0+C1xrrSqb0DjvRzvCyyW3E40nqp6T3tvOn1u843gGPiu9sKSkuxX3YOiWcyuEZyLeO 2v8/GOPIlqL83+o++6ZlVWaca5fOZpF4RGKWo3SkodOKJ4O3AU1ISrtGhGUnCHeQTT4eLK rXKADDjMejpnW8DhNxqh6wwueFyldw0RyR/mOD+qdjnPSPFd1yH2Q3dMEXfj0ar3X6kV11 64wWA+SWJ7BL8TbCtPEfqNI5iRrvtMJ6mluyvwfewkOePPnOuMW4f1LHXZ0hZ9eSVI62uM +iJ2hmbu4IAzU4kZhChUGzDwdJZgte2KqcS2lZHk97PU0vT/B+f3cEsBs/lhlg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1645707344; a=rsa-sha256; cv=none; b=hdgCvrnfc94J5JU/kGJA/5IevGfsDJhxgr9KCMwpU1lVyj8hZGcKmYDKFXJviya9tjr5rg mbr8PBUftyYvGaz/XtC/Oz+efMqQu/CTfCfQ216wo2zRGl9/4UOLCbukcrtC1FHhzlFkpP 8HupYptRocPFKCh4INNU+jNtVzV3abuXdF+yWcCW0vY4xotAPXhjYQbG76vsEjybAOAWJH jQte7B1jgYb4SRI/m5nBrqvu9I8fUtLCtiNRtz0iIIVZJH0jN0/bLjUfP+cklXm7zJSSU7 vE6QMWi7huCDk1Qs2heVZm6HL9IzEjCu5Yuu0UlDe3TuKKfVrGRHK9pHKQLk8Q== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by mw: URL: https://cgit.FreeBSD.org/src/commit/?id=5922f5218fcf04af27d41a87357c355a728c9e2a commit 5922f5218fcf04af27d41a87357c355a728c9e2a Author: Marcin Wojtas AuthorDate: 2022-01-23 19:21:17 +0000 Commit: Marcin Wojtas CommitDate: 2022-02-24 12:53:43 +0000 ena: merge ena-com v2.5.0 upgrade Merge commit '2530eb1fa01bf28fbcfcdda58bd41e055dcb2e4a' Adjust the driver to the upgraded ena-com part twofold: First update is related to the driver's NUMA awareness. Allocate I/O queue memory in NUMA domain local to the CPU bound to the given queue, improving data access time. Since this can result in performance hit for unaware users, this is done only when RSS option is enabled, for other cases the driver relies on kernel to allocate memory by itself. Information about first CPU bound is saved in adapter structure, so the binding persists after bringing the interface down and up again. If there are more buckets than interface queues, the driver will try to bind different interfaces to different CPUs using round-robin algorithm (but it will not bind queues to CPUs which do not have any RSS buckets associated with them). This is done to better utilize hardware resources by spreading the load. Add (read-only) per-queue sysctls in order to provide the following information: - queueN.domain: NUMA domain associated with the queue - queueN.cpu: CPU affinity of the queue The second change is for the CSUM_OFFLOAD constant, as ENA platform file has removed its definition. To align to that change, it has been added to the ena_datapath.h file. Submitted by: Artur Rojek Submitted by: Dawid Gorecki Obtained from: Semihalf MFC after: 2 weeks Sponsored by: Amazon, Inc. (cherry picked from commit eb4c4f4a2e18659b67a6bf1ea5f473c7ed8c854f) --- share/man/man4/ena.4 | 5 +++++ sys/contrib/ena-com/ena_plat.h | 26 ++++++++++++++++++------ sys/dev/ena/ena.c | 46 +++++++++++++++++++++++++++++++++++++++--- sys/dev/ena/ena.h | 2 ++ sys/dev/ena/ena_datapath.h | 2 ++ sys/dev/ena/ena_sysctl.c | 8 ++++++++ 6 files changed, 80 insertions(+), 9 deletions(-) diff --git a/share/man/man4/ena.4 b/share/man/man4/ena.4 index aacf7956c9f8..089457fd4872 100644 --- a/share/man/man4/ena.4 +++ b/share/man/man4/ena.4 @@ -71,6 +71,11 @@ is advertised by the device via the Admin Queue), a dedicated MSI-X interrupt vector per Tx/Rx queue pair, and CPU cacheline optimized data placement. .Pp +When RSS is enabled, each Tx/Rx queue pair is bound to a corresponding +CPU core and its NUMA domain. The order of those bindings is based on +the RSS bucket mapping. For builds with RSS support disabled, the +CPU and NUMA management is left to the kernel. +.Pp The .Nm driver supports industry standard TCP/IP offload features such diff --git a/sys/contrib/ena-com/ena_plat.h b/sys/contrib/ena-com/ena_plat.h index 274f795950c0..9287532b8476 100644 --- a/sys/contrib/ena-com/ena_plat.h +++ b/sys/contrib/ena-com/ena_plat.h @@ -42,6 +42,7 @@ __FBSDID("$FreeBSD$"); #include #include +#include #include #include #include @@ -170,6 +171,8 @@ static inline long PTR_ERR(const void *ptr) #define ENA_COM_TIMER_EXPIRED ETIMEDOUT #define ENA_COM_EIO EIO +#define ENA_NODE_ANY (-1) + #define ENA_MSLEEP(x) pause_sbt("ena", SBT_1MS * (x), SBT_1MS, 0) #define ENA_USLEEP(x) pause_sbt("ena", SBT_1US * (x), SBT_1US, 0) #define ENA_UDELAY(x) DELAY(x) @@ -277,7 +280,7 @@ typedef struct ifnet ena_netdev; void ena_dmamap_callback(void *arg, bus_dma_segment_t *segs, int nseg, int error); int ena_dma_alloc(device_t dmadev, bus_size_t size, ena_mem_handle_t *dma, - int mapflags, bus_size_t alignment); + int mapflags, bus_size_t alignment, int domain); static inline uint32_t ena_reg_read32(struct ena_bus *bus, bus_size_t offset) @@ -299,16 +302,27 @@ ena_reg_read32(struct ena_bus *bus, bus_size_t offset) } while (0) #define ENA_MEM_ALLOC(dmadev, size) malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO) -#define ENA_MEM_ALLOC_NODE(dmadev, size, virt, node, dev_node) (virt = NULL) + +#define ENA_MEM_ALLOC_NODE(dmadev, size, virt, node, dev_node) \ + do { \ + (virt) = malloc_domainset((size), M_DEVBUF, \ + (node) < 0 ? DOMAINSET_RR() : DOMAINSET_PREF(node), \ + M_NOWAIT | M_ZERO); \ + (void)(dev_node); \ + } while (0) + #define ENA_MEM_FREE(dmadev, ptr, size) \ do { \ (void)(size); \ free(ptr, M_DEVBUF); \ } while (0) #define ENA_MEM_ALLOC_COHERENT_NODE_ALIGNED(dmadev, size, virt, phys, \ - handle, node, dev_node, alignment) \ + dma, node, dev_node, alignment) \ do { \ - ((virt) = NULL); \ + ena_dma_alloc((dmadev), (size), &(dma), 0, (alignment), \ + (node)); \ + (virt) = (void *)(dma).vaddr; \ + (phys) = (dma).paddr; \ (void)(dev_node); \ } while (0) @@ -320,7 +334,8 @@ ena_reg_read32(struct ena_bus *bus, bus_size_t offset) #define ENA_MEM_ALLOC_COHERENT_ALIGNED(dmadev, size, virt, phys, dma, \ alignment) \ do { \ - ena_dma_alloc((dmadev), (size), &(dma), 0, alignment); \ + ena_dma_alloc((dmadev), (size), &(dma), 0, (alignment), \ + ENA_NODE_ANY); \ (virt) = (void *)(dma).vaddr; \ (phys) = (dma).paddr; \ } while (0) @@ -366,7 +381,6 @@ ena_reg_read32(struct ena_bus *bus, bus_size_t offset) #define time_after(a,b) ((long)((unsigned long)(b) - (unsigned long)(a)) < 0) #define VLAN_HLEN sizeof(struct ether_vlan_header) -#define CSUM_OFFLOAD (CSUM_IP|CSUM_TCP|CSUM_UDP) #define prefetch(x) (void)(x) #define prefetchw(x) (void)(x) diff --git a/sys/dev/ena/ena.c b/sys/dev/ena/ena.c index 84ef234cd937..63b4598a9352 100644 --- a/sys/dev/ena/ena.c +++ b/sys/dev/ena/ena.c @@ -198,7 +198,7 @@ ena_dmamap_callback(void *arg, bus_dma_segment_t *segs, int nseg, int error) int ena_dma_alloc(device_t dmadev, bus_size_t size, - ena_mem_handle_t *dma, int mapflags, bus_size_t alignment) + ena_mem_handle_t *dma, int mapflags, bus_size_t alignment, int domain) { struct ena_adapter* adapter = device_get_softc(dmadev); device_t pdev = adapter->pdev; @@ -229,6 +229,13 @@ ena_dma_alloc(device_t dmadev, bus_size_t size, goto fail_tag; } + error = bus_dma_tag_set_domain(dma->tag, domain); + if (unlikely(error != 0)) { + ena_log(pdev, ERR, "bus_dma_tag_set_domain failed: %d\n", + error); + goto fail_map_create; + } + error = bus_dmamem_alloc(dma->tag, (void**) &dma->vaddr, BUS_DMA_COHERENT | BUS_DMA_ZERO, &dma->map); if (unlikely(error != 0)) { @@ -1445,6 +1452,8 @@ ena_create_io_queues(struct ena_adapter *adapter) ctx.queue_size = adapter->requested_tx_ring_size; ctx.msix_vector = msix_vector; ctx.qid = ena_qid; + ctx.numa_node = adapter->que[i].domain; + rc = ena_com_create_io_queue(ena_dev, &ctx); if (rc != 0) { ena_log(adapter->pdev, ERR, @@ -1462,6 +1471,11 @@ ena_create_io_queues(struct ena_adapter *adapter) ena_com_destroy_io_queue(ena_dev, ena_qid); goto err_tx; } + + if (ctx.numa_node >= 0) { + ena_com_update_numa_node(ring->ena_com_io_cq, + ctx.numa_node); + } } /* Create RX queues */ @@ -1473,6 +1487,8 @@ ena_create_io_queues(struct ena_adapter *adapter) ctx.queue_size = adapter->requested_rx_ring_size; ctx.msix_vector = msix_vector; ctx.qid = ena_qid; + ctx.numa_node = adapter->que[i].domain; + rc = ena_com_create_io_queue(ena_dev, &ctx); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, @@ -1491,6 +1507,11 @@ ena_create_io_queues(struct ena_adapter *adapter) ena_com_destroy_io_queue(ena_dev, ena_qid); goto err_rx; } + + if (ctx.numa_node >= 0) { + ena_com_update_numa_node(ring->ena_com_io_cq, + ctx.numa_node); + } } for (i = 0; i < adapter->num_io_queues; i++) { @@ -1646,12 +1667,22 @@ ena_setup_io_intr(struct ena_adapter *adapter) #ifdef RSS int num_buckets = rss_getnumbuckets(); static int last_bind = 0; + int cur_bind; + int idx; #endif int irq_idx; if (adapter->msix_entries == NULL) return (EINVAL); +#ifdef RSS + if (adapter->first_bind < 0) { + adapter->first_bind = last_bind; + last_bind = (last_bind + adapter->num_io_queues) % num_buckets; + } + cur_bind = adapter->first_bind; +#endif + for (int i = 0; i < adapter->num_io_queues; i++) { irq_idx = ENA_IO_IRQ_IDX(i); @@ -1666,9 +1697,17 @@ ena_setup_io_intr(struct ena_adapter *adapter) #ifdef RSS adapter->que[i].cpu = adapter->irq_tbl[irq_idx].cpu = - rss_getcpu(last_bind); - last_bind = (last_bind + 1) % num_buckets; + rss_getcpu(cur_bind); + cur_bind = (cur_bind + 1) % num_buckets; CPU_SETOF(adapter->que[i].cpu, &adapter->que[i].cpu_mask); + + for (idx = 0; idx < MAXMEMDOM; ++idx) { + if (CPU_ISSET(adapter->que[i].cpu, &cpuset_domain[idx])) + break; + } + adapter->que[i].domain = idx; +#else + adapter->que[i].domain = -1; #endif } @@ -3459,6 +3498,7 @@ ena_attach(device_t pdev) adapter = device_get_softc(pdev); adapter->pdev = pdev; + adapter->first_bind = -1; /* * Set up the timer service - driver is responsible for avoiding diff --git a/sys/dev/ena/ena.h b/sys/dev/ena/ena.h index f559f9127c11..260c26482898 100644 --- a/sys/dev/ena/ena.h +++ b/sys/dev/ena/ena.h @@ -222,6 +222,7 @@ struct ena_que { int cpu; cpuset_t cpu_mask; #endif + int domain; struct sysctl_oid *oid; }; @@ -439,6 +440,7 @@ struct ena_adapter { uint32_t buf_ring_size; /* RSS*/ + int first_bind; struct ena_indir *rss_indir; uint8_t mac_addr[ETHER_ADDR_LEN]; diff --git a/sys/dev/ena/ena_datapath.h b/sys/dev/ena/ena_datapath.h index 4886ff1e6391..8da6a2a0edc9 100644 --- a/sys/dev/ena/ena_datapath.h +++ b/sys/dev/ena/ena_datapath.h @@ -39,4 +39,6 @@ void ena_qflush(if_t ifp); int ena_mq_start(if_t ifp, struct mbuf *m); void ena_deferred_mq_start(void *arg, int pending); +#define CSUM_OFFLOAD (CSUM_IP|CSUM_TCP|CSUM_UDP) + #endif /* ENA_TXRX_H */ diff --git a/sys/dev/ena/ena_sysctl.c b/sys/dev/ena/ena_sysctl.c index 7337f6578e68..f523bdbdbe81 100644 --- a/sys/dev/ena/ena_sysctl.c +++ b/sys/dev/ena/ena_sysctl.c @@ -208,6 +208,14 @@ ena_sysctl_add_stats(struct ena_adapter *adapter) adapter->que[i].oid = queue_node; +#ifdef RSS + /* Common stats */ + SYSCTL_ADD_INT(ctx, queue_list, OID_AUTO, "cpu", + CTLFLAG_RD, &adapter->que[i].cpu, 0, "CPU affinity"); + SYSCTL_ADD_INT(ctx, queue_list, OID_AUTO, "domain", + CTLFLAG_RD, &adapter->que[i].domain, 0, "NUMA domain"); +#endif + /* TX specific stats */ tx_node = SYSCTL_ADD_NODE(ctx, queue_list, OID_AUTO, "tx_ring", CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "TX ring");