From nobody Wed Feb 23 17:10:18 2022 X-Original-To: dev-commits-src-branches@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 2378519E105F; Wed, 23 Feb 2022 17:10:19 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4K3jGL6lP8z4sDf; Wed, 23 Feb 2022 17:10:18 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1645636219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=PIl9c76TeAyVbKDQvUdMhUYrewTBPE78T7zZvxcSbJo=; b=ZoKllkLeQAFFC7YCA9Q0biX2xZTGKnMMRLZbNcLWg4V9p9Xu71ejWm1F4aksqNclRWEvIB PQQ2uiSeeqc3XCNMBAF60k8bjU3u+sWvbcjI7F1XXo9gwwMwdbFXP3H56XeA1cfAu2lzSE RVwdfizwrZ4pON6a0ybOd77+nkKvuZ1wFsei8Hv65xVuRFLaDDP8vWQYgylLrDcRiQn7bK OUOPy6rPH8+qmc9DUbOh0qoXzBNLSqqKG/+tb5KTVuUY/iRA3MaV/+tAsDmDoowckjZT4Q NdkLssIVZ7EOXf06YrCTStvWFRveiqToY5x0/MI9YBqKd83ZJe6O5Nz2Iyfl2A== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id BE1181BF61; Wed, 23 Feb 2022 17:10:18 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 21NHAIex042266; Wed, 23 Feb 2022 17:10:18 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 21NHAIGq042262; Wed, 23 Feb 2022 17:10:18 GMT (envelope-from git) Date: Wed, 23 Feb 2022 17:10:18 GMT Message-Id: <202202231710.21NHAIGq042262@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Kristof Provost Subject: git: 2e0bee4c7f81 - stable/13 - if_epair: implement fanout List-Id: Commits to the stable branches of the FreeBSD src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-branches List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-branches@freebsd.org X-BeenThere: dev-commits-src-branches@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: kp X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: 2e0bee4c7f8176e0f8396c9389275745bac1e263 Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1645636219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=PIl9c76TeAyVbKDQvUdMhUYrewTBPE78T7zZvxcSbJo=; b=vqXYM2Zbxirp39Dm5XFiMs3XISvFuiF7T7xlSEWf80n77GUsTg6pytY3yOg8e5saTUAu1f fv3OsMOqCe1Rqq+oM3nataoJ3vhEcDb/NGG+N0+N6PRIIqpSRgP5bXhePa4kxTai9EOaHo I3F8Am1QggU61/DrWFxjDsa0Zlx/Ktp1YvAkFPruDLyAfGjjZx3R4ZfGwl4ndYObi1KwAk oHg8JBmow2vNX23dYtIrR+9HbPQrzvTg0IJXkAg8DhPVrgiIAnzF3Xqa7Z1nt/cxyecy9M GNAB1Cu15Cd1Uuu86h0S+h7ewunnQHbWrXNmx1Nxu2u2I443SuI1OyZAxUQy4A== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1645636219; a=rsa-sha256; cv=none; b=ZwuGm0/jex8AJ58QyfhY+fJTCqopEsWbDkPlquEjYNnuhSHBKriY33GEccUf3ZSnDcMe77 fzV+NyCjtxLw+eI72aFUmAsS4Qw71j3K1GBuPj3+62EC/bh4O4DPHNQwsS1rzorGa5Qoje 6y1tgZ3Xl2GWr8bA21Sn3M9KPuKrX5ntbS1REco71fYr7EYM5YCfwf6KrFKn5lJvxdSON1 hQeqAi5a8+3/bRShG9Ndll4MyjBRi33vwkXzOKxmu+XXXhAZcVZFUNoUwOA5ZgacdKTO5m BC0zX+bmVE2eFwKnVmJ6a8oDzEfWWAvf1o1ilOI4Lu4r39uq9gjXZPSGkJZaBg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by kp: URL: https://cgit.FreeBSD.org/src/commit/?id=2e0bee4c7f8176e0f8396c9389275745bac1e263 commit 2e0bee4c7f8176e0f8396c9389275745bac1e263 Author: Kristof Provost AuthorDate: 2021-12-09 13:24:13 +0000 Commit: Kristof Provost CommitDate: 2022-02-23 15:39:04 +0000 if_epair: implement fanout Allow multiple cores to be used to process if_epair traffic. We do this (if RSS is enabled) based on the RSS hash of the incoming packet. This allows us to distribute the load over multiple cores, rather than sending everything to the same one. We also switch from swi_sched() to taskqueues, which also contributes to better throughput. Benchmark results: With net.isr.maxthreads=-1 Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1) Before 627 Kpps After (no RSS) 1.198 Mpps After (RSS) 3.148 Mpps Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1) Before 7.705 Kpps After (no RSS) 1.017 Mpps After (RSS) 2.083 Mpps MFC after: 3 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D33731 (cherry picked from commit 24f0bfbad57b9c3cb9b543a60b2ba00e4812c286) --- sys/net/if_epair.c | 309 ++++++++++++++++++++++++++++++++--------------------- 1 file changed, 186 insertions(+), 123 deletions(-) diff --git a/sys/net/if_epair.c b/sys/net/if_epair.c index d15939dfe48b..818f25f0cdb5 100644 --- a/sys/net/if_epair.c +++ b/sys/net/if_epair.c @@ -40,6 +40,8 @@ #include __FBSDID("$FreeBSD$"); +#include "opt_rss.h" + #include #include #include @@ -50,10 +52,11 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include -#include +#include #include #include #include @@ -68,6 +71,11 @@ __FBSDID("$FreeBSD$"); #include #include #include +#ifdef RSS +#include +#include +#include +#endif #include static int epair_clone_match(struct if_clone *, const char *); @@ -90,21 +98,32 @@ static unsigned int next_index = 0; #define EPAIR_LOCK() mtx_lock(&epair_n_index_mtx) #define EPAIR_UNLOCK() mtx_unlock(&epair_n_index_mtx) -static void *swi_cookie[MAXCPU]; /* swi(9). */ -static STAILQ_HEAD(, epair_softc) swi_sc[MAXCPU]; +struct epair_softc; +struct epair_queue { + int id; + struct buf_ring *rxring[2]; + volatile int ridx; /* 0 || 1 */ + struct task tx_task; + struct epair_softc *sc; +}; static struct mtx epair_n_index_mtx; struct epair_softc { - struct ifnet *ifp; /* This ifp. */ - struct ifnet *oifp; /* other ifp of pair. */ - void *swi_cookie; /* swi(9). */ - struct buf_ring *rxring[2]; - volatile int ridx; /* 0 || 1 */ - struct ifmedia media; /* Media config (fake). */ - uint32_t cpuidx; + struct ifnet *ifp; /* This ifp. */ + struct ifnet *oifp; /* other ifp of pair. */ + int num_queues; + struct epair_queue *queues; + struct ifmedia media; /* Media config (fake). */ STAILQ_ENTRY(epair_softc) entry; }; +struct epair_tasks_t { + int tasks; + struct taskqueue *tq[MAXCPU]; +}; + +static struct epair_tasks_t epair_tasks; + static void epair_clear_mbuf(struct mbuf *m) { @@ -119,59 +138,43 @@ epair_clear_mbuf(struct mbuf *m) } static void -epair_if_input(struct epair_softc *sc, int ridx) +epair_if_input(struct epair_softc *sc, struct epair_queue *q, int ridx) { - struct epoch_tracker et; struct ifnet *ifp; struct mbuf *m; ifp = sc->ifp; - NET_EPOCH_ENTER(et); - do { - m = buf_ring_dequeue_sc(sc->rxring[ridx]); + CURVNET_SET(ifp->if_vnet); + while (! buf_ring_empty(q->rxring[ridx])) { + m = buf_ring_dequeue_mc(q->rxring[ridx]); if (m == NULL) - break; + continue; MPASS((m->m_pkthdr.csum_flags & CSUM_SND_TAG) == 0); (*ifp->if_input)(ifp, m); - } while (1); - NET_EPOCH_EXIT(et); + } + CURVNET_RESTORE(); } static void -epair_sintr(struct epair_softc *sc) +epair_tx_start_deferred(void *arg, int pending) { + struct epair_queue *q = (struct epair_queue *)arg; + struct epair_softc *sc = q->sc; int ridx, nidx; if_ref(sc->ifp); + ridx = atomic_load_int(&q->ridx); do { - ridx = sc->ridx; nidx = (ridx == 0) ? 1 : 0; - } while (!atomic_cmpset_int(&sc->ridx, ridx, nidx)); - epair_if_input(sc, ridx); + } while (!atomic_fcmpset_int(&q->ridx, &ridx, nidx)); + epair_if_input(sc, q, ridx); - if_rele(sc->ifp); -} + if (! buf_ring_empty(q->rxring[nidx])) + taskqueue_enqueue(epair_tasks.tq[q->id], &q->tx_task); -static void -epair_intr(void *arg) -{ - struct epair_softc *sc; - uint32_t cpuidx; - - cpuidx = (uintptr_t)arg; - /* If this is a problem, this is a read-mostly situation. */ - EPAIR_LOCK(); - STAILQ_FOREACH(sc, &swi_sc[cpuidx], entry) { - /* Do this lockless. */ - if (buf_ring_empty(sc->rxring[sc->ridx])) - continue; - epair_sintr(sc); - } - EPAIR_UNLOCK(); - - return; + if_rele(sc->ifp); } static int @@ -181,7 +184,12 @@ epair_menq(struct mbuf *m, struct epair_softc *osc) int len, ret; int ridx; short mflags; + struct epair_queue *q = NULL; + uint32_t bucket; bool was_empty; +#ifdef RSS + struct ether_header *eh; +#endif /* * I know this looks weird. We pass the "other sc" as we need that one @@ -202,13 +210,38 @@ epair_menq(struct mbuf *m, struct epair_softc *osc) MPASS(m->m_nextpkt == NULL); MPASS((m->m_pkthdr.csum_flags & CSUM_SND_TAG) == 0); - ridx = atomic_load_int(&osc->ridx); - was_empty = buf_ring_empty(osc->rxring[ridx]); - ret = buf_ring_enqueue(osc->rxring[ridx], m); +#ifdef RSS + ret = rss_m2bucket(m, &bucket); + if (ret) { + /* Actually hash the packet. */ + eh = mtod(m, struct ether_header *); + + switch (ntohs(eh->ether_type)) { + case ETHERTYPE_IP: + rss_soft_m2cpuid_v4(m, 0, &bucket); + break; + case ETHERTYPE_IPV6: + rss_soft_m2cpuid_v6(m, 0, &bucket); + break; + default: + bucket = 0; + break; + } + } + bucket %= osc->num_queues; +#else + bucket = 0; +#endif + q = &osc->queues[bucket]; + + ridx = atomic_load_int(&q->ridx); + was_empty = buf_ring_empty(q->rxring[ridx]); + ret = buf_ring_enqueue(q->rxring[ridx], m); if (ret != 0) { /* Ring is full. */ + if_inc_counter(ifp, IFCOUNTER_OQDROPS, 1); m_freem(m); - return (0); + goto done; } if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1); @@ -223,9 +256,9 @@ epair_menq(struct mbuf *m, struct epair_softc *osc) /* Someone else received the packet. */ if_inc_counter(oifp, IFCOUNTER_IPACKETS, 1); - /* Kick the interrupt handler for the first packet. */ - if (was_empty && osc->swi_cookie != NULL) - swi_sched(osc->swi_cookie, 0); +done: + if (was_empty) + taskqueue_enqueue(epair_tasks.tq[bucket], &q->tx_task); return (0); } @@ -491,16 +524,27 @@ epair_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params) /* Allocate memory for both [ab] interfaces */ sca = malloc(sizeof(struct epair_softc), M_EPAIR, M_WAITOK | M_ZERO); sca->ifp = if_alloc(IFT_ETHER); + sca->num_queues = epair_tasks.tasks; if (sca->ifp == NULL) { free(sca, M_EPAIR); ifc_free_unit(ifc, unit); return (ENOSPC); } - sca->rxring[0] = buf_ring_alloc(RXRSIZE, M_EPAIR, M_WAITOK,NULL); - sca->rxring[1] = buf_ring_alloc(RXRSIZE, M_EPAIR, M_WAITOK, NULL); + sca->queues = mallocarray(sca->num_queues, sizeof(struct epair_queue), + M_EPAIR, M_WAITOK); + for (int i = 0; i < sca->num_queues; i++) { + struct epair_queue *q = &sca->queues[i]; + q->id = i; + q->rxring[0] = buf_ring_alloc(RXRSIZE, M_EPAIR, M_WAITOK, NULL); + q->rxring[1] = buf_ring_alloc(RXRSIZE, M_EPAIR, M_WAITOK, NULL); + q->ridx = 0; + q->sc = sca; + NET_TASK_INIT(&q->tx_task, 0, epair_tx_start_deferred, q); + } scb = malloc(sizeof(struct epair_softc), M_EPAIR, M_WAITOK | M_ZERO); scb->ifp = if_alloc(IFT_ETHER); + scb->num_queues = epair_tasks.tasks; if (scb->ifp == NULL) { free(scb, M_EPAIR); if_free(sca->ifp); @@ -508,8 +552,17 @@ epair_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params) ifc_free_unit(ifc, unit); return (ENOSPC); } - scb->rxring[0] = buf_ring_alloc(RXRSIZE, M_EPAIR, M_WAITOK, NULL); - scb->rxring[1] = buf_ring_alloc(RXRSIZE, M_EPAIR, M_WAITOK, NULL); + scb->queues = mallocarray(scb->num_queues, sizeof(struct epair_queue), + M_EPAIR, M_WAITOK); + for (int i = 0; i < scb->num_queues; i++) { + struct epair_queue *q = &scb->queues[i]; + q->id = i; + q->rxring[0] = buf_ring_alloc(RXRSIZE, M_EPAIR, M_WAITOK, NULL); + q->rxring[1] = buf_ring_alloc(RXRSIZE, M_EPAIR, M_WAITOK, NULL); + q->ridx = 0; + q->sc = scb; + NET_TASK_INIT(&q->tx_task, 0, epair_tx_start_deferred, q); + } /* * Cross-reference the interfaces so we will be able to free both. @@ -524,41 +577,6 @@ epair_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params) #else hash = 0; #endif - if (swi_cookie[hash] == NULL) { - void *cookie; - - EPAIR_UNLOCK(); - error = swi_add(NULL, epairname, - epair_intr, (void *)(uintptr_t)hash, - SWI_NET, INTR_MPSAFE, &cookie); - if (error) { - buf_ring_free(scb->rxring[0], M_EPAIR); - buf_ring_free(scb->rxring[1], M_EPAIR); - if_free(scb->ifp); - free(scb, M_EPAIR); - buf_ring_free(sca->rxring[0], M_EPAIR); - buf_ring_free(sca->rxring[1], M_EPAIR); - if_free(sca->ifp); - free(sca, M_EPAIR); - ifc_free_unit(ifc, unit); - return (ENOSPC); - } - EPAIR_LOCK(); - /* Recheck under lock even though a race is very unlikely. */ - if (swi_cookie[hash] == NULL) { - swi_cookie[hash] = cookie; - } else { - EPAIR_UNLOCK(); - (void) swi_remove(cookie); - EPAIR_LOCK(); - } - } - sca->cpuidx = hash; - STAILQ_INSERT_TAIL(&swi_sc[hash], sca, entry); - sca->swi_cookie = swi_cookie[hash]; - scb->cpuidx = hash; - STAILQ_INSERT_TAIL(&swi_sc[hash], scb, entry); - scb->swi_cookie = swi_cookie[hash]; EPAIR_UNLOCK(); /* Initialise pseudo media types. */ @@ -665,12 +683,15 @@ epair_drain_rings(struct epair_softc *sc) struct mbuf *m; for (ridx = 0; ridx < 2; ridx++) { - do { - m = buf_ring_dequeue_sc(sc->rxring[ridx]); - if (m == NULL) - break; - m_freem(m); - } while (1); + for (int i = 0; i < sc->num_queues; i++) { + struct epair_queue *q = &sc->queues[i]; + do { + m = buf_ring_dequeue_sc(q->rxring[ridx]); + if (m == NULL) + break; + m_freem(m); + } while (1); + } } } @@ -703,14 +724,6 @@ epair_clone_destroy(struct if_clone *ifc, struct ifnet *ifp) ether_ifdetach(ifp); ether_ifdetach(oifp); - /* Second stop interrupt handler. */ - EPAIR_LOCK(); - STAILQ_REMOVE(&swi_sc[sca->cpuidx], sca, epair_softc, entry); - STAILQ_REMOVE(&swi_sc[scb->cpuidx], scb, epair_softc, entry); - EPAIR_UNLOCK(); - sca->swi_cookie = NULL; - scb->swi_cookie = NULL; - /* Third free any queued packets and all the resources. */ CURVNET_SET_QUIET(oifp->if_vnet); epair_drain_rings(scb); @@ -721,16 +734,24 @@ epair_clone_destroy(struct if_clone *ifc, struct ifnet *ifp) __func__, error); if_free(oifp); ifmedia_removeall(&scb->media); - buf_ring_free(scb->rxring[0], M_EPAIR); - buf_ring_free(scb->rxring[1], M_EPAIR); + for (int i = 0; i < scb->num_queues; i++) { + struct epair_queue *q = &scb->queues[i]; + buf_ring_free(q->rxring[0], M_EPAIR); + buf_ring_free(q->rxring[1], M_EPAIR); + } + free(scb->queues, M_EPAIR); free(scb, M_EPAIR); CURVNET_RESTORE(); epair_drain_rings(sca); if_free(ifp); ifmedia_removeall(&sca->media); - buf_ring_free(sca->rxring[0], M_EPAIR); - buf_ring_free(sca->rxring[1], M_EPAIR); + for (int i = 0; i < sca->num_queues; i++) { + struct epair_queue *q = &sca->queues[i]; + buf_ring_free(q->rxring[0], M_EPAIR); + buf_ring_free(q->rxring[1], M_EPAIR); + } + free(sca->queues, M_EPAIR); free(sca, M_EPAIR); /* Last free the cloner unit. */ @@ -758,34 +779,76 @@ vnet_epair_uninit(const void *unused __unused) VNET_SYSUNINIT(vnet_epair_uninit, SI_SUB_INIT_IF, SI_ORDER_ANY, vnet_epair_uninit, NULL); +static int +epair_mod_init() +{ + char name[32]; + epair_tasks.tasks = 0; + +#ifdef RSS + struct pcpu *pcpu; + int cpu; + + CPU_FOREACH(cpu) { + cpuset_t cpu_mask; + + /* Pin to this CPU so we get appropriate NUMA allocations. */ + pcpu = pcpu_find(cpu); + thread_lock(curthread); + sched_bind(curthread, cpu); + thread_unlock(curthread); + + snprintf(name, sizeof(name), "epair_task_%d", cpu); + + epair_tasks.tq[cpu] = taskqueue_create(name, M_WAITOK, + taskqueue_thread_enqueue, + &epair_tasks.tq[cpu]); + CPU_SETOF(cpu, &cpu_mask); + taskqueue_start_threads_cpuset(&epair_tasks.tq[cpu], 1, PI_NET, + &cpu_mask, "%s", name); + + epair_tasks.tasks++; + } +#else + snprintf(name, sizeof(name), "epair_task"); + + epair_tasks.tq[0] = taskqueue_create(name, M_WAITOK, + taskqueue_thread_enqueue, + &epair_tasks.tq[0]); + taskqueue_start_threads(&epair_tasks.tq[0], 1, PI_NET, "%s", name); + + epair_tasks.tasks = 1; +#endif + + return (0); +} + +static void +epair_mod_cleanup() +{ + + for (int i = 0; i < epair_tasks.tasks; i++) { + taskqueue_drain_all(epair_tasks.tq[i]); + taskqueue_free(epair_tasks.tq[i]); + } +} + static int epair_modevent(module_t mod, int type, void *data) { - int i; + int ret; switch (type) { case MOD_LOAD: - for (i = 0; i < MAXCPU; i++) { - swi_cookie[i] = NULL; - STAILQ_INIT(&swi_sc[i]); - } EPAIR_LOCK_INIT(); + ret = epair_mod_init(); + if (ret != 0) + return (ret); if (bootverbose) printf("%s: %s initialized.\n", __func__, epairname); break; case MOD_UNLOAD: - EPAIR_LOCK(); - for (i = 0; i < MAXCPU; i++) { - if (!STAILQ_EMPTY(&swi_sc[i])) { - printf("%s: swi_sc[%d] active\n", __func__, i); - EPAIR_UNLOCK(); - return (EBUSY); - } - } - EPAIR_UNLOCK(); - for (i = 0; i < MAXCPU; i++) - if (swi_cookie[i] != NULL) - (void) swi_remove(swi_cookie[i]); + epair_mod_cleanup(); EPAIR_LOCK_DESTROY(); if (bootverbose) printf("%s: %s unloaded.\n", __func__, epairname);