freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone

Philip M. Gollucci pgollucci at gmail.com
Wed Mar 28 03:09:18 UTC 2012


On 03/27/12 02:32, Philip M. Gollucci wrote:
> Some other tuning updates
> 
> $ zfs set zfs:zfs_nocacheflush = 1
> $ sysctl vfs.zfs.prefetch_disable=1
> 
> $ cat /etc/my.cnf
> skip-innodb-doublewrite
> innodb_flush_log_at_trx_commit=2
> 
> 
> $ zfs set primarycache=metadata zmysqlD
> $ zfs set atime=off zmysqlD
> $ zfs set recordsize=16k zmysqlD
> 
> but not on zmysqlL
> 
> my next plan is to turn off tmpfs and use ZVOL swaps then to simply use
> just zroot/tmp as a normal dir.
> 
> after that I'll drastically increase maxswzone.
> 
> still hoping someone has already done this.

None of that made a difference; however I haven't tried the ZVOL swaps
yet b/c they're quite new and this after all production eventually.

so I've been reading up on maxswzone.  Its seems to me that nobody
really understands it.

Fortunately it isn't used very much,

It works out to roughly 7.7GB from 32MB okay fine.
If I double it, that should give me 15.4GB from 64MB (still not enough).
If I 16x it that should give me 246GB from 512MB.  Thats more my
physical ram + swap.  Oh well.


I've seen John Baldwin write on lists
o) you have another problem if the default isn't enough
o) when it panics I pick up the crash dump swap info and do
   #blocks in use*totalswblocks/maxswzone
o) setting it higher claims wired memory which can't be reused.

tuning(7) is from the 4.x days and is useless here.

something thats really confusing me is if the output from
 $ vmstat -z |grep solaris is relevant
 or the size of my swap itself

or if by upping maxswzone I'm taking away too much from zfs in the long run.

So tracing this below
kern.maxswzone="536870912" # = 16*(32*1024*1024)
vm.stats.vm.v_page_count: 24411488

n=12205744  ###    n = cnt.v_page_count / 2;

if (maxswzone && n > maxswzone / sizeof(struct swblock))
  n = maxswzone / sizeof(struct swblock);

struct swblock {
        struct swblock  *swb_hnext;
        vm_object_t     swb_object;
        vm_pindex_t     swb_index;
        int             swb_count;
        daddr_t         swb_pages[SWAP_META_PAGES];
};
if this is >43.98 bytes then the conditional is true; however its not
b/c the printf() message isn't written out below.
	if (n2 != n)
		printf("Swap zone entries reduced from %d to %d.\n",

which means the initial allocation succeeds with n=12205744 and not
maxswzone.

ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
SWAPMETA:               288, 1864135,       0,       0,       0,   0,   0

So more than a little perplex by these size/limits and that none of its
used on a system thats running out of it.








subr_param.c:
---------------
long	maxswzone;			/* max swmeta KVA storage */
SYSCTL_LONG(_kern, OID_AUTO, maxswzone, CTLFLAG_RDTUN, &maxswzone, 0,
    "Maximum memory for swap metadata");
#ifdef VM_SWZONE_SIZE_MAX
	maxswzone = VM_SWZONE_SIZE_MAX;
#endif
TUNABLE_LONG_FETCH("kern.maxswzone", &maxswzone);

param.h:
--------
/*
 * Ceiling on amount of swblock kva space, can be changed via
 * the kern.maxswzone /boot/loader.conf variable.
 */
#ifndef VM_SWZONE_SIZE_MAX
#define	VM_SWZONE_SIZE_MAX	(32 * 1024 * 1024)
#endif

swap_pager.c:
--------------
void
swap_pager_swap_init(void)
{
	int n, n2;
//comments skipped
	nsw_cluster_max = min((MAXPHYS/PAGE_SIZE), MAX_PAGEOUT_CLUSTER);

	mtx_lock(&pbuf_mtx);
	nsw_rcount = (nswbuf + 1) / 2;
	nsw_wcount_sync = (nswbuf + 3) / 4;
	nsw_wcount_async = 4;
	nsw_wcount_async_max = nsw_wcount_async;
	mtx_unlock(&pbuf_mtx);
	/*
	 * Initialize our zone.  Right now I'm just guessing on the number
	 * we need based on the number of pages in the system.  Each swblock
	 * can hold 16 pages, so this is probably overkill.  This reservation
	 * is typically limited to around 32MB by default.
	 */
	n = cnt.v_page_count / 2;
	if (maxswzone && n > maxswzone / sizeof(struct swblock))
		n = maxswzone / sizeof(struct swblock);
	n2 = n;
	swap_zone = uma_zcreate("SWAPMETA", sizeof(struct swblock), NULL, NULL,
	    NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE | UMA_ZONE_VM);
	if (swap_zone == NULL)
		panic("failed to create swap_zone.");
	do {
		if (uma_zone_set_obj(swap_zone, &swap_zone_obj, n))
			break;
		/*
		 * if the allocation failed, try a zone two thirds the
		 * size of the previous attempt.
		 */
		n -= ((n + 2) / 3);
	} while (n > 0);
	if (n2 != n)
		printf("Swap zone entries reduced from %d to %d.\n", n2, n);
	n2 = n;

	/*
	 * Initialize our meta-data hash table.  The swapper does not need to
	 * be quite as efficient as the VM system, so we do not use an
	 * oversized hash table.
	 *
	 * 	n: 		size of hash table, must be power of 2
	 *	swhash_mask:	hash table index mask
	 */
	for (n = 1; n < n2 / 8; n *= 2)
		;
	swhash = malloc(sizeof(struct swblock *) * n, M_VMPGDATA, M_WAITOK |
M_ZERO);
	swhash_mask = n - 1;
	mtx_init(&swhash_mtx, "swap_pager swhash", NULL, MTX_DEF);
}


-- 
------------------------------------------------------------------------
1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70  3F8C 75B8 8FFB DB9B 8C1C
Philip M. Gollucci (pgollucci at p6m7g8.com) c: 703.336.9354
Member,                           Apache Software Foundation
Committer,                        FreeBSD Foundation
Consultant,                       P6M7G8 Inc.
Director Operations,              Ridecharge Inc.

Work like you don't need the money,
love like you'll never get hurt,
and dance like nobody's watching.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20120328/ab583672/signature.pgp


More information about the freebsd-questions mailing list