Tuning kern.maxswzone

Matthew Rezny matthew at reztek.cz
Sat Feb 1 21:16:17 UTC 2014


On Sat, 01 Feb 2014 13:03:15 -0600
Richard Todd <rmtodd at servalan.servalan.com> wrote:

> Matthew Rezny <matthew at reztek.cz> writes:
> 
> > So, as best I can tell, the actual effective number used for
> > kern.maxswzone is indeed approximately 8x available RAM. If there is
> > some need to turn it down (using substantially less swap) then that
> > is possible, but turning it up (as suggested by the warning
> > message) is not possible. Setting any value higher does not appear
> > to actually
> 
> Yeah, IIRC I ran into that when configing some VMs with
> really big swap space for benefit of tmpfs.   This is the quick hack
> I used to get around that, you might give it a try. 
> 
> # HG changeset patch
> # Parent e4dd7df011139e2b224835aa6e330c90afcf9a55
> patch swap_pager to unconditionally use maxswzone tunable if it is
> set -- helpful for our tbvm VMs with large swap space for tmpfs
> 
> diff -r e4dd7df01113 sys/vm/swap_pager.c
> --- a/sys/vm/swap_pager.c	Wed Feb 20 00:15:49 2013 -0600
> +++ b/sys/vm/swap_pager.c	Sat Feb 23 13:08:54 2013 -0600
> @@ -546,7 +546,7 @@
>  	 * is typically limited to around 32MB by default.
>  	 */
>  	n = cnt.v_page_count / 2;
> -	if (maxswzone && n > maxswzone / sizeof(struct swblock))
> +	if (maxswzone)
>  		n = maxswzone / sizeof(struct swblock);
>  	n2 = n;
>  	swap_zone = uma_zcreate("SWAPMETA", sizeof(struct swblock),
> NULL, NULL,
> 
Thank you for pointing me in the right direction. Now that I'm
looking at the right file, I don't know how I failed to find it
myself with grep. The logic here is rather obviously flawed, the
maxswzone value is only used if it is less than the calculated default.
If there is some reason to not allow adjusting this up, then the warning
message is incorrect. Either way, something should change and I guess I
should file a PR on this one. At least I can see the warning is taking
the doubling for safety into account, so for my particular case with an
overrun under 10% it should probably be ok to put fixing this on the
back burner.

> Also,
> 
> > With /usr/src cleared (and after running fsck) I booted back into
> > 10-PRERELEASE to try to fetch the 10-STABLE sources again. I started
> > svnlite co and find it hung shortly thereafter. I tried a few times
> > but each time I see it does a couple hundred files at best and just
> > stops. When it stops, I can't login to another terminal. If I have
> > a spare console logged in, I can't run anything. After a few tries,
> > I manged to catch it where I had top running in one VT, started the
> > checkout, and then switched back just in time. I never could even
> > get top up with rm running (it probably blows over some limit
> > faster). When the checkout hangs, the state of svnlite is "kmem a"
> > according to top. I can only guess that is short for kmem alloc, I
> > guess some syscall is waiting on an allocation that will never
> > happen because something already is using or has leaked everything
> > that could satisfy that allocation. It looks like activity on too
> > many files within a short period runs something out.
> 
> No, it's just a new bit of debugging code that causes the system
> to spend lots of CPU time verifying integrity of some of its internal
> data structures, especially on wimpy hardware (e.g., my dual PII/400
> box, which is where I noticed this recently.)  You'll find if you're
> patient that it isn't a complete hang, it will actually get work done
> in between the debug passes. Set sysctl debug.vmem_check=0 to disable
> the check.  This is I think completely independent from the maxswzone
> stuff, it's just you were seeing it for the first time since the
> debug code in question was only recently added to 10-STABLE.
> 
If only it were that simple. I'm not yet on 10-STABLE, I'm struggling
to get the sources to reach that point. These C3 boxes are
10-PRERELEASE so I don't yet have this debug.vmem_check to tweak.
sysctl says unknown oid when I query it. I tried setting it from loader
prompt but still says unknown oid and I see no change in behavior.

Also, I'm not seeing anything using lots of CPU time. If I start off
top on another VT before I start svnlite, then I have a decent chance
of seeing what goes on until the situation becomes dire. svnlite starts
off moving quick and using lots of CPU time (>50%) for the first
hundred files or so, then it halts in the "kmem a" state and CPU is
completely idle. It sits there for a while, and eventually does
something more. Each time the stop is longer and less work is done in
the interval between stops. Eventually the process appears to hang
completely in that I can leave it for half a day and no more progress
is made. The rm process could go longer in this state with still some
visible progress since it's operation is sufficiently simple to
actually observe it managed to do something, whereas svnlite might do
something occasionally but it's not enough to get to the next file in
the list.

With each burst and wait cycle, I see a spray in dmesg saying
calcru: runtime went backwards [lots] for {all processes}. If I hit
ctrl-C and wait, it'll interrupt svnlite the next time it would do
something other than wait. It can easily take 10 min for that to
happen, meaning the wait period is long enough to consider the process
as not effectively running. If I let it go long enough, it gets to a
state where it never exits on ctrl-c (or at least it'd take more hours
than I've been willing to wait). In this last attempt, I let it go for
5 min, pushed ctrl-c, waited another 10 min, then tried ctrl-alt-del
and that just beeps so the attempt to interrupt the process resulted in
total I/O lockup to the point key presses are not handled. That last
one was enough to finally require manual fsck on reboot (which should
be a testament to the resiliency of UFS as I've pushed the reset button
a hundred times in the past day and a half).

>                         Richard
> 
> 

Thank you for taking the time to respond. It's now clear the maxswzone
is just a red herring, the real issue are the apparent hangs which I'm
seeing on several more boxes. The mystery is why the one box with
slightly more RAM seems ok, but a couple boxes with far more RAM are
not ok. That will probably be answered when I figure out what the cause
is.


More information about the freebsd-stable mailing list