svn commit: r331508 - in head: lib/libc/sys share/man/man9 usr.bin/cpuset

Jeff Roberson jeff at FreeBSD.org
Sat Mar 24 23:58:45 UTC 2018


Author: jeff
Date: Sat Mar 24 23:58:44 2018
New Revision: 331508
URL: https://svnweb.freebsd.org/changeset/base/331508

Log:
  Document new NUMA related syscalls and utility options.
  
  Sponsored by:	Netflix, Dell/EMC Isilon

Modified:
  head/lib/libc/sys/Makefile.inc
  head/lib/libc/sys/cpuset.2
  head/lib/libc/sys/cpuset_getaffinity.2
  head/share/man/man9/Makefile
  head/share/man/man9/malloc.9
  head/share/man/man9/zone.9
  head/usr.bin/cpuset/cpuset.1

Modified: head/lib/libc/sys/Makefile.inc
==============================================================================
--- head/lib/libc/sys/Makefile.inc	Sat Mar 24 23:26:54 2018	(r331507)
+++ head/lib/libc/sys/Makefile.inc	Sat Mar 24 23:58:44 2018	(r331508)
@@ -174,6 +174,7 @@ MAN+=	abort2.2 \
 	connectat.2 \
 	cpuset.2 \
 	cpuset_getaffinity.2 \
+	cpuset_getdomain.2 \
 	dup.2 \
 	execve.2 \
 	_exit.2 \
@@ -371,6 +372,7 @@ MLINKS+=nanosleep.2 clock_nanosleep.2
 MLINKS+=cpuset.2 cpuset_getid.2 \
 	cpuset.2 cpuset_setid.2
 MLINKS+=cpuset_getaffinity.2 cpuset_setaffinity.2
+MLINKS+=cpuset_getdomain.2 cpuset_setdomain.2
 MLINKS+=dup.2 dup2.2
 MLINKS+=execve.2 fexecve.2
 MLINKS+=extattr_get_file.2 extattr.2 \

Modified: head/lib/libc/sys/cpuset.2
==============================================================================
--- head/lib/libc/sys/cpuset.2	Sat Mar 24 23:26:54 2018	(r331507)
+++ head/lib/libc/sys/cpuset.2	Sat Mar 24 23:58:44 2018	(r331508)
@@ -48,21 +48,21 @@
 The
 .Nm
 family of system calls allow applications to control sets of processors and
-assign processes and threads to these sets.
-Processor sets contain lists of CPUs that members may run on and exist only
-as long as some process is a member of the set.
+memory domains and assign processes and threads to these sets.
+Processor sets contain lists of CPUs and domains that members may run on
+and exist only as long as some process is a member of the set.
 All processes in the system have an assigned set.
 The default set for all processes in the system is the set numbered 1.
 Threads belong to the same set as the process which contains them,
 however, they may further restrict their set with the anonymous
-per-thread mask.
+per-thread mask to bind to a specific CPU or subset of CPUs and memory domains.
 .Pp
 Sets are referenced by a number of type
 .Ft cpuset_id_t .
 Each thread has a root set, an assigned set, and an anonymous mask.
 Only the root and assigned sets are numbered.
-The root set is the set of all CPUs available in the system or in the
-system partition the thread is running in.
+The root set is the set of all CPUs and memory domains available in the system
+or in the system partition the thread is running in.
 The assigned set is a subset of the root set and is administratively
 assignable on a per-process basis.
 Many processes and threads may be members of a numbered set.
@@ -72,7 +72,8 @@ set.
 It is intended that administrators will manipulate numbered sets using
 .Xr cpuset 1
 while application developers will manipulate anonymous sets using
-.Xr cpuset_setaffinity 2 .
+.Xr cpuset_setaffinity 2 and
+.Xr cpuset_setdomain 2 .
 .Pp
 To select the correct set a value of type
 .Ft cpulevel_t
@@ -175,9 +176,10 @@ with a process or thread is unsupported since
 this references the unnumbered anonymous mask.
 .Pp
 The actual contents of the sets may be retrieved or manipulated using
-.Xr cpuset_getaffinity 2
-and
-.Xr cpuset_setaffinity 2 .
+.Xr cpuset_getaffinity 2 ,
+.Xr cpuset_setaffinity 2 ,
+.Xr cpuset_getdomain 2 , and
+.Xr cpuset_setdomain 2 .
 See those manual pages for more detail.
 .Sh RETURN VALUES
 .Rv -std
@@ -220,6 +222,8 @@ for allocation.
 .Xr cpuset 1 ,
 .Xr cpuset_getaffinity 2 ,
 .Xr cpuset_setaffinity 2 ,
+.Xr cpuset_getdomain 2 ,
+.Xr cpuset_setdomain 2 ,
 .Xr pthread_affinity_np 3 ,
 .Xr pthread_attr_affinity_np 3 ,
 .Xr cpuset 9

Modified: head/lib/libc/sys/cpuset_getaffinity.2
==============================================================================
--- head/lib/libc/sys/cpuset_getaffinity.2	Sat Mar 24 23:26:54 2018	(r331507)
+++ head/lib/libc/sys/cpuset_getaffinity.2	Sat Mar 24 23:58:44 2018	(r331508)
@@ -160,6 +160,8 @@ See
 .Xr cpuset 2 ,
 .Xr cpuset_getid 2 ,
 .Xr cpuset_setid 2 ,
+.Xr cpuset_getdomain 2 ,
+.Xr cpuset_setdomain 2 ,
 .Xr pthread_affinity_np 3 ,
 .Xr pthread_attr_affinity_np 3 ,
 .Xr cpuset 9

Modified: head/share/man/man9/Makefile
==============================================================================
--- head/share/man/man9/Makefile	Sat Mar 24 23:26:54 2018	(r331507)
+++ head/share/man/man9/Makefile	Sat Mar 24 23:58:44 2018	(r331508)
@@ -1271,6 +1271,8 @@ MLINKS+=make_dev.9 destroy_dev.9 \
 	make_dev.9 make_dev_p.9 \
 	make_dev.9 make_dev_s.9
 MLINKS+=malloc.9 free.9 \
+	malloc.9 malloc_domain.9 \
+	malloc.9 free_domain.9 \
 	malloc.9 mallocarray.9 \
 	malloc.9 MALLOC_DECLARE.9 \
 	malloc.9 MALLOC_DEFINE.9 \
@@ -2213,10 +2215,12 @@ MLINKS+=vslock.9 vsunlock.9
 MLINKS+=zone.9 uma.9 \
 	zone.9 uma_zalloc.9 \
 	zone.9 uma_zalloc_arg.9 \
+	zone.9 uma_zalloc_domain.9 \
 	zone.9 uma_zcreate.9 \
 	zone.9 uma_zdestroy.9 \
 	zone.9 uma_zfree.9 \
 	zone.9 uma_zfree_arg.9 \
+	zone.9 uma_zfree_domain.9 \
 	zone.9 uma_zone_get_cur.9 \
 	zone.9 uma_zone_get_max.9 \
 	zone.9 uma_zone_set_max.9 \

Modified: head/share/man/man9/malloc.9
==============================================================================
--- head/share/man/man9/malloc.9	Sat Mar 24 23:26:54 2018	(r331507)
+++ head/share/man/man9/malloc.9	Sat Mar 24 23:58:44 2018	(r331508)
@@ -46,9 +46,13 @@
 .Ft void *
 .Fn malloc "size_t size" "struct malloc_type *type" "int flags"
 .Ft void *
+.Fn malloc_domain "size_t size" "struct malloc_type *type" "int domain" "int flags"
+.Ft void *
 .Fn mallocarray "size_t nmemb" "size_t size" "struct malloc_type *type" "int flags"
 .Ft void
 .Fn free "void *addr" "struct malloc_type *type"
+.Ft void
+.Fn free_domain "void *addr" "struct malloc_type *type"
 .Ft void *
 .Fn realloc "void *addr" "size_t size" "struct malloc_type *type" "int flags"
 .Ft void *
@@ -64,6 +68,14 @@ The
 function allocates uninitialized memory in kernel address space for an
 object whose size is specified by
 .Fa size .
+.Pp
+The
+.Fn malloc_domain
+variant allocates the object from the specified memory domain.  Memory allocated
+with this function should be returned with
+.Fn free_domain .
+See
+.Xr numa 9 for more details.
 .Pp
 The
 .Fn mallocarray

Modified: head/share/man/man9/zone.9
==============================================================================
--- head/share/man/man9/zone.9	Sat Mar 24 23:26:54 2018	(r331507)
+++ head/share/man/man9/zone.9	Sat Mar 24 23:58:44 2018	(r331508)
@@ -32,8 +32,10 @@
 .Nm uma_zcreate ,
 .Nm uma_zalloc ,
 .Nm uma_zalloc_arg ,
+.Nm uma_zalloc_domain ,
 .Nm uma_zfree ,
 .Nm uma_zfree_arg ,
+.Nm uma_zfree_domain ,
 .Nm uma_zdestroy ,
 .Nm uma_zone_set_max ,
 .Nm uma_zone_get_max ,
@@ -55,11 +57,15 @@
 .Fn uma_zalloc "uma_zone_t zone" "int flags"
 .Ft "void *"
 .Fn uma_zalloc_arg "uma_zone_t zone" "void *arg" "int flags"
+.Ft "void *"
+.Fn uma_zalloc_domain "uma_zone_t zone" "void *arg" "int domain" "int flags"
 .Ft void
 .Fn uma_zfree "uma_zone_t zone" "void *item"
 .Ft void
 .Fn uma_zfree_arg "uma_zone_t zone" "void *item" "void *arg"
 .Ft void
+.Fn uma_zfree_domain "uma_zone_t zone" "void *item" "void *arg"
+.Ft void
 .Fn uma_zdestroy "uma_zone_t zone"
 .Ft int
 .Fn uma_zone_set_max "uma_zone_t zone" "int nitems"
@@ -78,10 +84,13 @@
 .Fn SYSCTL_ADD_UMA_CUR ctx parent nbr name access zone descr
 .Sh DESCRIPTION
 The zone allocator provides an efficient interface for managing
-dynamically-sized collections of items of similar size.
+dynamically-sized collections of items of identical size.
 The zone allocator can work with preallocated zones as well as with
 runtime-allocated ones, and is therefore available much earlier in the
-boot process than other memory management routines.
+boot process than other memory management routines.  The zone allocator
+provides per-cpu allocation caches with linear scalability on SMP
+systems as well as round-robin and first-touch policies for NUMA
+systems.
 .Pp
 A zone is an extensible collection of items of identical size.
 The zone allocator keeps track of which items are in use and which
@@ -209,6 +218,11 @@ The zone is for the
 subsystem.
 .It Dv UMA_ZONE_VM
 The zone is for the VM subsystem.
+.It Dv UMA_ZONE_NUMA
+The zone should use a first-touch NUMA policy rather than the round-robin
+default. Callers that do not free memory on the same domain it is allocated
+from will cause mixing in per-cpu caches.  See
+.Xr numa 9 for more details.
 .El
 .Pp
 To allocate an item from a zone, simply call
@@ -243,12 +257,21 @@ The variations
 .Fn uma_zalloc_arg
 and
 .Fn uma_zfree_arg
-allow to
+allow callers to
 specify an argument for the
 .Dv ctor
 and
 .Dv dtor
 functions, respectively.
+The 
+.Fn uma_zalloc_domain
+function allows callers to specify a fixed
+.Xr numa 9 domain to allocate from.  This uses a guaranteed but slow path in
+the allocator which reduces concurrency.  The 
+.Fn uma_zfree_domain
+function should be used to return memory allocated in this fashion.  This
+function infers the domain from the pointer and does not require it as an
+argument.
 .Pp
 Created zones,
 which are empty,

Modified: head/usr.bin/cpuset/cpuset.1
==============================================================================
--- head/usr.bin/cpuset/cpuset.1	Sat Mar 24 23:26:54 2018	(r331507)
+++ head/usr.bin/cpuset/cpuset.1	Sat Mar 24 23:58:44 2018	(r331508)
@@ -34,20 +34,24 @@
 .Sh SYNOPSIS
 .Nm
 .Op Fl l Ar cpu-list
+.Op Fl n Ar policy:domain-list 
 .Op Fl s Ar setid
 .Ar cmd ...
 .Nm
 .Op Fl l Ar cpu-list
+.Op Fl n Ar policy:domain-list 
 .Op Fl s Ar setid
 .Fl p Ar pid
 .Nm
 .Op Fl c
 .Op Fl l Ar cpu-list
+.Op Fl n Ar policy:domain-list 
 .Fl C
 .Fl p Ar pid
 .Nm
 .Op Fl c
 .Op Fl l Ar cpu-list
+.Op Fl n Ar policy:domain-list 
 .Op Fl j Ar jailid | Fl p Ar pid | Fl t Ar tid | Fl s Ar setid | Fl x Ar irq
 .Nm
 .Fl g
@@ -57,8 +61,9 @@
 The
 .Nm
 command can be used to assign processor sets to processes, run commands
-constrained to a given set or list of processors, and query information
-about processor binding, sets, and available processors in the system.
+constrained to a given set or list of processors and memory domains, and query
+information about processor binding, memory binding and policy, sets, and
+available processors and memory domains in the system.
 .Pp
 .Nm
 requires a target to modify or query.
@@ -92,6 +97,15 @@ This last set is the list of all possible CPUs in the 
 queried using
 .Fl r .
 .Pp
+Most sets include NUMA memory domain and policy information.  This can be
+inspected with
+.Fl g
+and set with
+.Fl n .
+This will specify which NUMA domains are visible to the process and
+affect where anonymous memory and file pages will be stored on first access.
+Files accessed first by other processes may specify conflicting policy.
+.Pp
 When running a command it may join a set specified with
 .Fl s
 otherwise a new set is created.
@@ -110,7 +124,8 @@ Create a new cpuset and assign the target process to t
 The requested operation should reference the cpuset available via the
 target specifier.
 .It Fl d Ar domain
-Specifies a NUMA domain id as the target of the operation.
+Specifies a NUMA domain id as the target of the operation.  This can only
+be used to query the cpus visible in each numberd domain.
 .It Fl g
 Causes
 .Nm
@@ -130,6 +145,13 @@ numbers separated by '-' for ranges and commas separat
 A special list of
 .Dq all
 may be specified in which case the list includes all CPUs from the root set.
+.It Fl n Ar domain-list:policy
+Specifies a list of domains and allocation policy to apply to a target.  Ranges
+may be specified as in
+.Fl l .
+Valid policies include first-touch, ft, round-robin, rr, and prefer.  The prefer
+policy accepts only a single domain in the set.  The parent of the set is
+consulted if the preferred domain is unavailable.
 .It Fl p Ar pid
 Specifies a pid as the target of the operation.
 .It Fl s Ar setid


More information about the svn-src-head mailing list