svn commit: r306525 - in head/lib: . librss

Adrian Chadd adrian at FreeBSD.org
Fri Sep 30 19:59:58 UTC 2016


Author: adrian
Date: Fri Sep 30 19:59:56 2016
New Revision: 306525
URL: https://svnweb.freebsd.org/changeset/base/306525

Log:
  Add librss, a simple wrapper around RSS APIs so applications can begin auto-tuning.
  
  I've used this in a handful of RSS test applications.  It is just some
  very simple functions to fetch the RSS configuration, query the per-bucket
  CPU set, and mark sockets as local to an RSS bucket.  It should be sufficient
  for both thread-based and process-based workloads.
  
  (Yes, I wrote a manpage.)
  
  This is based on some early RSS API and wrapper API work I did whilst
  I was at Netflix.  Thanks to Netflix for the very original work that
  spawned this; thanks to Peter Grehan for his feedback about RSS APIs
  and thanks to Jack Vogel and Navdeep Parhar for the NIC-facing side of the
  APIs.  These fed into the simple userland API I wrote up here.
  
  Reviewed by:	gallatin

Added:
  head/lib/librss/
  head/lib/librss/Makefile   (contents, props changed)
  head/lib/librss/librss.3   (contents, props changed)
  head/lib/librss/librss.c   (contents, props changed)
  head/lib/librss/librss.h   (contents, props changed)
Modified:
  head/lib/Makefile

Modified: head/lib/Makefile
==============================================================================
--- head/lib/Makefile	Fri Sep 30 19:46:50 2016	(r306524)
+++ head/lib/Makefile	Fri Sep 30 19:59:56 2016	(r306525)
@@ -89,6 +89,7 @@ SUBDIR=	${SUBDIR_BOOTSTRAP} \
 	libprocstat \
 	${_libradius} \
 	librpcsvc \
+	librss \
 	librt \
 	${_librtld_db} \
 	libsbuf \

Added: head/lib/librss/Makefile
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/lib/librss/Makefile	Fri Sep 30 19:59:56 2016	(r306525)
@@ -0,0 +1,13 @@
+# $FreeBSD$
+
+PACKAGE=	lib${LIB}
+SHLIBDIR?=	/lib
+
+.include <src.opts.mk>
+
+LIB=	rss
+SHLIB_MAJOR=	1
+
+SRCS=librss.c
+
+.include <bsd.lib.mk>

Added: head/lib/librss/librss.3
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/lib/librss/librss.3	Fri Sep 30 19:59:56 2016	(r306525)
@@ -0,0 +1,153 @@
+.\" $FreeBSD$
+.\"
+.Dd September 29, 2016
+.Dt LIBRSS 3
+.Os
+.Sh NAME
+.Nm librss
+.Nd Provide Receive-side scaling awareness to userland applications
+.Sh LIBRARY
+.Lb librss
+.Sh SYNOPSIS
+.In librss.h
+.Ft struct rss_config *
+.Fn rss_config_get "void"
+.Ft void
+.Fn rss_config_free "struct rss_config *cfg"
+.Ft int
+.Fn rss_config_get_bucket_count "struct rss_config *cfg"
+.Ft int
+.Fn rss_set_bucket_rebalance_cb "rss_bucket_rebalance_cb_t *cb" "void *cbdata"
+.Ft int
+.Fn rss_sock_set_bindmulti "int fd" "int af" "int val"
+.Ft int
+.Fn rss_sock_set_rss_bucket "int fd" "int af" "int rss_bucket"
+.Ft int
+.Fn rss_sock_set_recvrss "int fd" "int af" "int val"
+.Sh DESCRIPTION
+The
+.Nm
+library and the functions it provides are used for both fetching
+the system RSS configuration and interacting with RSS aware
+sockets.
+.Pp
+Applications will typically call
+.Fn rss_config_get
+to fetch the current RSS configuration from the system and perform
+initial setup.
+This typically involves spawning worker threads, one per RSS bucket,
+and optionally binding them to the per-bucket CPU set.
+.Pp
+The
+.Vt rss_config
+struct is defined as:
+.Bd -literal
+struct rss_config {
+	int rss_ncpus;
+	int rss_nbuckets;
+	int rss_basecpu;
+	int *rss_bucket_map;
+};
+.Ed
+.Pp
+Applications will typically use the
+.Fn rss_config_get_bucket_count
+function to fetch the number of RSS buckets, create one thread
+per RSS bucket for RSS aware work, then one RSS aware socket to receive
+UDP datagrams or TCP connections
+in each particular RSS bucket / thread.
+.Pp
+The
+.Fn rss_get_bucket_cpuset
+function sets the given cpuset up for the given
+RSS bucket and behaviour.
+Typically applications will wish to just query for
+.Vt RSS_BUCKET_TYPE_KERNEL_ALL
+unless they wish to potentially setup different
+worker threads for transmit and receive.
+.Pp
+The
+.Vt rss_bucket_type_t
+enum is defined as:
+.Bd -literal
+typedef enum {
+        RSS_BUCKET_TYPE_NONE = 0,
+        RSS_BUCKET_TYPE_KERNEL_ALL = 1,
+        RSS_BUCKET_TYPE_KERNEL_TX = 2,
+        RSS_BUCKET_TYPE_KERNEL_RX = 3,
+        RSS_BUCKET_TYPE_MAX = 3,
+} rss_bucket_type_t;
+.Ed
+.Pp
+The rebalance callback
+.Vt rss_bucket_rebalance_cb_t
+is defined as:
+.Bd -literal
+typedef void rss_bucket_rebalance_cb_t(void *arg);
+.Ed
+.Pp
+The
+.Fn rss_set_bucket_rebalance_cb
+function sets an optional callback that will be called if the kernel
+rebalances RSS buckets.
+This is intended as a future expansion to rebalance buckets rather than
+reprogram the RSS key, so typically the only work to be performed
+is to rebind worker threads to an updated cpuset.
+.Pp
+Once RSS setup is completed,
+.Fn rss_config_free
+is called to free the RSS configuration structure.
+.Pp
+To make a
+.Vt bind
+socket RSS aware, the
+.Fn rss_sock_set_bindmulti
+function is used to enable or disable per-RSS bucket
+behaviour.
+The socket filedescriptor, address family and enable flag
+.Vt val
+are passed in.
+.Pp
+If
+.Vt val
+is set to 1, the socket can be placed in an RSS bucket and will only accept
+datagrams (for UDP) or connections (for TCP) that are received for that
+RSS bucket.
+If set to 0, the socket is placed in the default PCB and will see
+datagrams/connections that are not initially consumed by a PCB aware
+socket.
+.Pp
+The
+.Fn rss_sock_set_rss_bucket
+function configures the RSS bucket which a socket belongs in.
+Note that TCP sockets created by
+.Xr accept 2
+will automatically be assigned to the RSS bucket.
+.Pp
+The
+.Fn rss_sock_set_recvrss
+function enables or disables receiving RSS related information
+as socket options in.
+.2 recvmsg
+calls.
+.Pp
+When enabled, UDP datagrams will have a message with the
+.Vt IP_RECVFLOWID
+option indicating the 32-bit receive flowid as a uint32_t,
+and the
+.Vt IP_RECVRSSBUCKETID
+option indicating the 32 bit RSS bucket id as a uint32_t.
+.Sh ERRORS
+The functions return either <0 or NULL as appropriate upon error.
+.Sh SEE ALSO
+.Xr PCBGROUP 9
+.Sh HISTORY
+The
+.Xr librss.3
+library first appeared in
+.Fx 11.0 .
+.Sh AUTHORS
+.An Adrian Chadd Aq Mt adrian at FreeBSD.org
+.Sh BUGS
+There is currently no kernel mechanism to rebalance the RSS bucket to CPU
+mapping, and so the callback mechanism is a no-op.

Added: head/lib/librss/librss.c
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/lib/librss/librss.c	Fri Sep 30 19:59:56 2016	(r306525)
@@ -0,0 +1,311 @@
+/*
+ * Copyright (c) 2016 Adrian Chadd <adrian at FreeBSD.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <sys/param.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/cpuset.h>
+#include <sys/sysctl.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <strings.h>
+#include <err.h>
+#include <fcntl.h>
+#include <string.h>
+#include <errno.h>
+
+#include <netinet/in.h>
+
+#include "librss.h"
+
+int
+rss_sock_set_bindmulti(int fd, int af, int val)
+{
+	int opt;
+	socklen_t optlen;
+	int retval;
+
+	/* Set bindmulti */
+	opt = val;
+	optlen = sizeof(opt);
+	retval = setsockopt(fd,
+	    af == AF_INET ? IPPROTO_IP : IPPROTO_IPV6,
+	    af == AF_INET ? IP_BINDMULTI : IPV6_BINDMULTI,
+	    &opt,
+	    optlen);
+	if (retval < 0) {
+		warn("%s: setsockopt(IP_BINDMULTI)", __func__);
+		return (-1);
+	}
+	return (0);
+}
+
+int
+rss_sock_set_rss_bucket(int fd, int af, int rss_bucket)
+{
+	int opt;
+	socklen_t optlen;
+	int retval;
+	int f, p;
+
+	switch (af) {
+	case AF_INET:
+		p = IPPROTO_IP;
+		f = IP_RSS_LISTEN_BUCKET;
+		break;
+	case AF_INET6:
+		p = IPPROTO_IPV6;
+		f = IPV6_RSS_LISTEN_BUCKET;
+		break;
+	default:
+		return (-1);
+	}
+
+	/* Set RSS bucket */
+	opt = rss_bucket;
+	optlen = sizeof(opt);
+	retval = setsockopt(fd, p, f, &opt, optlen);
+	if (retval < 0) {
+		warn("%s: setsockopt(IP_RSS_LISTEN_BUCKET)", __func__);
+		return (-1);
+	}
+	return (0);
+}
+
+int
+rss_sock_set_recvrss(int fd, int af, int val)
+{
+	int opt, retval;
+	socklen_t optlen;
+	int f1, f2, p;
+
+	switch (af) {
+	case AF_INET:
+		p = IPPROTO_IP;
+		f1 = IP_RECVFLOWID;
+		f2 = IP_RECVRSSBUCKETID;
+		break;
+	case AF_INET6:
+		p = IPPROTO_IPV6;
+		f1 = IPV6_RECVFLOWID;
+		f2 = IPV6_RECVRSSBUCKETID;
+		break;
+	default:
+		return (-1);
+	}
+
+	/* Enable/disable flowid */
+	opt = val;
+	optlen = sizeof(opt);
+	retval = setsockopt(fd, p, f1, &opt, optlen);
+	if (retval < 0) {
+		warn("%s: setsockopt(IP_RECVFLOWID)", __func__);
+		return (-1);
+	}
+
+	/* Enable/disable RSS bucket reception */
+	opt = val;
+	optlen = sizeof(opt);
+	retval = setsockopt(fd, p, f2, &opt, optlen);
+	if (retval < 0) {
+		warn("%s: setsockopt(IP_RECVRSSBUCKETID)", __func__);
+		return (-1);
+	}
+
+	return (0);
+}
+
+static int
+rss_getsysctlint(const char *s)
+{
+	int val, retval;
+	size_t rlen;
+
+	rlen = sizeof(int);
+	retval = sysctlbyname(s, &val, &rlen, NULL, 0);
+	if (retval < 0) {
+		warn("sysctlbyname (%s)", s);
+		return (-1);
+	}
+
+	return (val);
+}
+
+static int
+rss_getbucketmap(int *bucket_map, int nbuckets)
+{
+	/* XXX I'm lazy; so static string it is */
+	char bstr[2048];
+	int retval;
+	size_t rlen;
+	char *s, *ss;
+	int r, b, c;
+
+	/* Paranoia */
+	memset(bstr, '\0', sizeof(bstr));
+
+	rlen = sizeof(bstr) - 1;
+	retval = sysctlbyname("net.inet.rss.bucket_mapping", bstr, &rlen, NULL, 0);
+	if (retval < 0) {
+		warn("sysctlbyname (net.inet.rss.bucket_mapping)");
+		return (-1);
+	}
+
+	ss = bstr;
+	while ((s = strsep(&ss, " ")) != NULL) {
+		r = sscanf(s, "%d:%d", &b, &c);
+		if (r != 2) {
+			fprintf(stderr, "%s: string (%s) not parsable\n",
+			    __func__,
+			    s);
+			return (-1);
+		}
+		if (b > nbuckets) {
+			fprintf(stderr, "%s: bucket %d > nbuckets %d\n",
+			    __func__,
+			    b,
+			    nbuckets);
+			return (-1);
+		}
+		/* XXX no maxcpu check */
+		bucket_map[b] = c;
+	}
+	return (0);
+}
+
+struct rss_config *
+rss_config_get(void)
+{
+	struct rss_config *rc = NULL;
+
+	rc = calloc(1, sizeof(*rc));
+	if (rc == NULL) {
+		warn("%s: calloc", __func__);
+		goto error;
+	}
+
+	rc->rss_ncpus = rss_getsysctlint("net.inet.rss.ncpus");
+	if (rc->rss_ncpus < 0) {
+		fprintf(stderr, "%s: couldn't fetch net.inet.rss.ncpus\n", __func__);
+		goto error;
+	}
+
+	rc->rss_nbuckets = rss_getsysctlint("net.inet.rss.buckets");
+	if (rc->rss_nbuckets < 0) {
+		fprintf(stderr, "%s: couldn't fetch net.inet.rss.nbuckets\n", __func__);
+		goto error;
+	}
+
+	rc->rss_basecpu = rss_getsysctlint("net.inet.rss.basecpu");
+	if (rc->rss_basecpu< 0) {
+		fprintf(stderr, "%s: couldn't fetch net.inet.rss.basecpu\n", __func__);
+		goto error;
+	}
+
+	rc->rss_bucket_map = calloc(rc->rss_nbuckets, sizeof(int));
+	if (rc->rss_bucket_map == NULL) {
+		warn("%s: calloc (rss buckets; %d entries)", __func__, rc->rss_nbuckets);
+		goto error;
+	}
+
+	if (rss_getbucketmap(rc->rss_bucket_map, rc->rss_nbuckets) != 0) {
+		fprintf(stderr, "%s: rss_getbucketmap failed\n", __func__);
+		goto error;
+	}
+
+	return (rc);
+
+error:
+	if ((rc != NULL) && rc->rss_bucket_map)
+		free(rc->rss_bucket_map);
+	if (rc != NULL)
+		free(rc);
+	return (NULL);
+}
+
+void
+rss_config_free(struct rss_config *rc)
+{
+
+	if ((rc != NULL) && rc->rss_bucket_map)
+		free(rc->rss_bucket_map);
+	if (rc != NULL)
+		free(rc);
+}
+
+int
+rss_config_get_bucket_count(struct rss_config *rc)
+{
+
+	if (rc == NULL)
+		return (-1);
+	return (rc->rss_nbuckets);
+}
+
+int
+rss_get_bucket_cpuset(struct rss_config *rc, rss_bucket_type_t btype,
+    int bucket, cpuset_t *cs)
+{
+
+	if (bucket < 0 || bucket >= rc->rss_nbuckets) {
+		errno = EINVAL;
+		return (-1);
+	}
+
+	/*
+	 * For now all buckets are the same, but eventually we'll want
+	 * to allow administrators to set separate RSS cpusets for
+	 * {kernel,user} {tx, rx} combinations.
+	 */
+	if (btype <= RSS_BUCKET_TYPE_NONE || btype > RSS_BUCKET_TYPE_MAX) {
+		errno = ENOTSUP;
+		return (-1);
+	}
+
+	CPU_ZERO(cs);
+	CPU_SET(rc->rss_bucket_map[bucket], cs);
+
+	return (0);
+}
+
+int
+rss_set_bucket_rebalance_cb(rss_bucket_rebalance_cb_t *cb, void *cbdata)
+{
+
+	(void) cb;
+	(void) cbdata;
+
+	/*
+	 * For now there's no rebalance callback, so
+	 * just return 0 and ignore it.
+	 */
+	return (0);
+}

Added: head/lib/librss/librss.h
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/lib/librss/librss.h	Fri Sep 30 19:59:56 2016	(r306525)
@@ -0,0 +1,101 @@
+/*
+ * Copyright (c) 2016 Adrian Chadd <adrian at FreeBSD.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$
+ */
+
+#ifndef	__LIBRSS_H__
+#define	__LIBRSS_H__
+
+struct rss_config {
+	int rss_ncpus;
+	int rss_nbuckets;
+	int rss_basecpu;
+	int *rss_bucket_map;
+};
+
+typedef enum {
+	RSS_BUCKET_TYPE_NONE = 0,
+	RSS_BUCKET_TYPE_KERNEL_ALL = 1,
+	RSS_BUCKET_TYPE_KERNEL_TX = 2,
+	RSS_BUCKET_TYPE_KERNEL_RX = 3,
+	RSS_BUCKET_TYPE_MAX = 3,
+} rss_bucket_type_t;
+
+typedef void rss_bucket_rebalance_cb_t(void *arg);
+
+/*
+ * Enable/disable whether to allow for multiple bind()s to the
+ * given PCB entry.
+ *
+ * This must be done before bind().
+ */
+extern	int rss_sock_set_bindmulti(int fd, int af, int val);
+
+/*
+ * Set the RSS bucket for the given file descriptor.
+ *
+ * This must be done before bind().
+ */
+extern	int rss_sock_set_rss_bucket(int fd, int af, int rss_bucket);
+
+/*
+ * Enable or disable receiving RSS/flowid information on
+ * received UDP frames.
+ */
+extern	int rss_sock_set_recvrss(int fd, int af, int val);
+
+/*
+ * Fetch RSS configuration information.
+ */
+extern	struct rss_config * rss_config_get(void);
+
+/*
+ * Free an RSS configuration structure.
+ */
+extern	void rss_config_free(struct rss_config *rc);
+
+/*
+ * Return how many RSS buckets there are.
+ */
+extern	int rss_config_get_bucket_count(struct rss_config *rc);
+
+/*
+ * Fetch the cpuset configuration for the given RSS bucket and
+ * type.
+ */
+extern	int rss_get_bucket_cpuset(struct rss_config *rc,
+    rss_bucket_type_t btype, int bucket, cpuset_t *cs);
+
+/*
+ * Set a callback for bucket rebalancing.
+ *
+ * This will occur in a separate thread context rather than
+ * a signal handler.
+ */
+extern	int rss_set_bucket_rebalance_cb(rss_bucket_rebalance_cb_t *cb,
+	    void *cbdata);
+
+#endif /* __LIBRSS_H__ */


More information about the svn-src-head mailing list