kern.maxswzone causing serious problems

Curtis Villamizar curtis at orleans.occnc.com
Thu Mar 29 03:37:06 UTC 2018


I'm starting to upgrade a set of servers from FreeBSD 11.0-STABLE #0
r308356 (OK, rather old) to FreeBSD 11.1-STABLE #0 r331152 without
much success.  I'm getting the occasionally discussed kern.maxswzone
problems but I really do need to configure that swap space.

On an upgraded server I'm getting (line continuation added for
readability):

   warning: total configured swap (5242880 pages) \
      exceeds maximum recommended amount (112288 pages).
   warning: increase kern.maxswzone or reduce amount of swap.
   warning: total configured swap (10485760 pages) \
      exceeds maximum recommended amount (112288 pages).
   warning: increase kern.maxswzone or reduce amount of swap.

The value previously used was not working.  I ended up temporarily
cutting swap in half as well to get rid of the error.  This is only a
symptom of a greater problem.

This machine has for a long time run multiple VM which total 17GB
memory on a server with 8 GB of physical memory.  Normally things are
fine becuase most of these servers are idle most of the time and all
but a small memory footprint can be paged out.  I have for years had
40 GB swap, with 20GB each on two spindles (at least going back to
FreeBSD 9) on drives identical partitioning with mirrored partitions.

\begin{talesofwoe}  % ignore if busy

When just trying to install base system software using md devices on
at a time (no VM running yet) I had been getting "killed: out of swap
space" messages.  I got rid of this by reducing the size of the VM
root partitions (most VM run zfs on another partition with not much on
root so this was OK).  Next problem was running the VM.  I worked
around this by reducing some VM memory (-m in bhyve) by half.  I can
get the whole set of VM to boot but now installing additional software
(which involved just moving files with scp and running tar) doesn't
work.  At this point I have 9 GB of VM running on 8 GB physical memory
and still couldn't even install software.  Once the maxswzone message
went away, so did all of these problems except my VMs now have half as
much RAM each.  Now each of the VM are reporting swap problems.  And
this is just upgrading one server to 11.1.

\end{talesofwoe}

This is a major regression for FreeBSD 11.1.  The same value used in
FreeBSD 11.0 should just work or there should be some documentation on
how to set this (preferably the error message).  If nothing else, some
advice on converting a kern.maxswzone value from 11.0 to a working
value for 11.1 would be nice.  The entry in the loader(8) man page is
not very helpful.

btw- Reporting swap size in MB or KB in the error message would be
helpful.  In addition to pages would be fine.  Mentioning what the
highest value kern.maxswzone could be set to would also be helpful.
Changing "warning: increase kern.maxswzone" to "warning: increase
kern.maxswzone to %d" would be very helpful.

\begin{naiveananlysis}

The magic (or mess, depending on perspective) is mostly in "void
swap_pager_swap_init(void)" in the file vm/swap_pager.c between lines
484 and 563 (in current which is same as stable/11 in this function).
The diffs from known working r308356 to current show a diff at "@@
-538,21 +518,25 @@" which has swpctrie_zone and swblk_zone computed
based on maxswzone, then runs uma_zone_reserve_kva based on maxswzone
and potentially reduces it.

In the older code a "Swap zone entries reduced" message would be
produced if uma_zone_reserve_kva cut back (which is moved and the
message changed a bit).  But I didn't see this message so
uma_zone_reserve_kva ran fine the first time without reducing "n".  In
the new code swap_maxpages and swzone are then set.
swapon_check_swzone gives the warning but does nothing as far as I can
tell other than two printfs.  It doesn't appear to do any harm to have
too much swap and ignore this warning (you just can't use it).

There is a multiplier by SWAP_META_PAGES which is defined to be
PCTRIE_COUNT which in sys/pctrie.h is defined as (1 << PCTRIE_WIDTH)
and PCTRIE_WIDTH is 4 or 3 depending on __LP64__.

swap_pager_swap_init calculates swap_maxpages but swapon_check_swzone
doesn't use it, calculating local variable maxpages (the same way)
instead.  Since it seems that npages / SWAP_META_PAGES is related to
what you'd want to set kern.maxswzone to.  If so, a better set of
printf might at least give better information.

Note that VM_SWZONE_SIZE_MAX defaults to (276 * 128 * 1024) which
would seem to be 128K * SWAP_META_PAGES * PAGE_SIZE = 8GB or 16GB
depending on if __LP64__ is defined, but that is for i386 only.  There
is no definition of VM_SWZONE_SIZE_MAX for amd64 unless it picks this
up from i386 which apparently it does.

One problem is the conditional for using maxswzone only allows the
swzone size to be reduced and not increased.  Those people frobbing
kern.maxswzone (including me for a time) were hopelessly wasting their
time.

Based on my math the old max swap was about 4 * available RAM.

Naive or horribly naive?  I haven't tried this yet ... (compiles)

\end{naiveananlysis}

After some playing around I ended up with the diffs below.

On the host with 8 GB RAM and back to 40 GB swap I got (indent and
line continuation added):

  warning: total configured swap (10485760 pages, 40960 MB) \
    exceeds maximum recommended amount (8100744 pages, 31643 MB).
  warning: increase kern.maxswzone from 0 to 275425296 \
    or reduce amount of swap.

After setting kern.maxswzone to 275425296 no complaint.

Still needs more testing.  Currently reinstalling the full set of VMs.
Later I'll try this on some of the VMs that have been configured with
various memory and swap sizes.  Also will revert to the conditions
that worked fine in prior versions and stopped working with 11.1.

Curtis


ps - about the diffs:

On amd64 VM_SWZONE_SIZE_MAX is not defined.  On i386 it is defined
based on a guess of sizeof(struct swblk).  On amd64 that size is 136
and the guess on i386 is 276 so I made the guess a #define and have a
comparison in the code to complain if the guess is off.

The original code lets maxswzone decrease swapzone but not increase
it, unlike prior code.  I put in a limit to how much it could be
increased (but not sure even that is legitimate - why not more).  The
replaced code does a printf on an attempt to set too high and reduces
the value.

The swapon_check_swzone check gives much more useful information than
it did before including exactly what to set kern.maxswzone to end up
with the recommended twice the swzone space.


Index: i386/include/param.h
===================================================================
--- i386/include/param.h	(revision 331152)
+++ i386/include/param.h	(working copy)
@@ -133,7 +133,8 @@
  * lower due to fragmentation.
  */
 #ifndef VM_SWZONE_SIZE_MAX
-#define VM_SWZONE_SIZE_MAX	(276 * 128 * 1024)
+#define SIZEOF_SWBLK_GUESS 276
+#define VM_SWZONE_SIZE_MAX	(SIZEOF_SWBLK_GUESS * 128 * 1024)
 #endif
 
 /*
Index: vm/swap_pager.c
===================================================================
--- vm/swap_pager.c	(revision 331152)
+++ vm/swap_pager.c	(working copy)
@@ -520,8 +520,16 @@
 	 * on the number of pages in the system.
 	 */
 	n = vm_cnt.v_page_count / 2;
-	if (maxswzone && n > maxswzone / sizeof(struct swblk))
+	/* reduce size or make larger within limits */
+	if (maxswzone && (n != maxswzone / sizeof(struct swblk))) {
+		if (4 * n < maxswzone / sizeof(struct swblk)) {
+			n *= 4;
+			printf("kern.maxswzone (%lu) set too high: "
+			       "limit is %lu\n", maxswzone,
+			       n * sizeof(struct swblk));
+		}
 		n = maxswzone / sizeof(struct swblk);
+	}
 	swpctrie_zone = uma_zcreate("swpctrie", pctrie_node_size(), NULL, NULL,
 	    pctrie_zone_init, NULL, UMA_ALIGN_PTR,
 	    UMA_ZONE_NOFREE | UMA_ZONE_VM);
@@ -2141,11 +2149,20 @@
 
 	/* recommend using no more than half that amount */
 	if (npages > maxpages / 2) {
-		printf("warning: total configured swap (%lu pages) "
-		    "exceeds maximum recommended amount (%lu pages).\n",
-		    npages, maxpages / 2);
-		printf("warning: increase kern.maxswzone "
-		    "or reduce amount of swap.\n");
+		printf("warning: total configured swap (%lu pages, %lu MB) "
+		       "exceeds maximum recommended amount (%lu pages, %lu MB).\n",
+		       npages, swap_total / (1024*1024),
+		       maxpages / 2, (maxpages / 2) * PAGE_SIZE / (1024*1024));
+		printf("warning: increase kern.maxswzone from %lu to %lu "
+		       "or reduce amount of swap.\n", maxswzone,
+		       (maxpages / SWAP_META_PAGES) * 2
+		       * sizeof(struct swblk));
+#ifdef SIZEOF_SWBLK_GUESS
+		if (SIZEOF_SWBLK_GUESS != sizeof(struct swblk))
+			printf("warning: bad guess on swblk size: "
+			       "%d != %lu\n",
+			       SIZEOF_SWBLK_GUESS, sizeof(struct swblk));
+#endif
 	}
 }
 


More information about the freebsd-stable mailing list