ZFS stats in "top" -- ZFS performance started being crappy in spurts

Chad Leigh - Pengar LLC chad at pengar.com
Sat Aug 11 23:33:29 UTC 2012


Hi

I have a FreeBSD 9 system with ZFS root.  It is actually a VM under Xen on a beefy piece of HW (4 core Sandy Bridge 3ghz Xeon, total HW memory 32GB -- VM has 4vcpus and 6GB RAM).  Mirrored gpart partitions.  I am looking for data integrity more than performance as long as performance is reasonable (which it has more than been the last 3 months).

The other "servers" on the same HW, the other VMs on the same, don't have this problem but are set up the same way.  There are 4 other FreeBSD VMs, one running email for a one man company and a few of his friends, as well as some static web pages and stuff for him, one runs a few low use web apps for various customers, and one runs about 30 websites with apache and nginx, mostly just static sites.  None are heavily used.  There is also one VM with linux running a couple low use FrontBase databases.   Not high use database -- low use ones.

The troubleseome VM  has been running fine for over 3 months since I installed it.    Level of use has been pretty much constant.   The server runs 4 jails on it, each dedicated to a different bit of email processing for a small number of users.   One is a secondary DNS.  One runs clamav and spamassassin.  One runs exim for incoming and outgoing mail.  One runs dovecot for imap and pop.   There is no web server or database or anything else running.

Total number of mail users on the system is approximately 50, plus or minus.  Total mail traffic is very low compared to "real" mail servers.

Earlier this week things started "freezing up".  It might last a few minutes, or it might last 1/2 hour.   Processes become unresponsive.  This can last a few minutes or much longer.  It eventually resolves itself and things are good for another 10 minutes or 3 hours until it happens again.  When it happens,  lots of processes are listed in "top" as 

zfs
zio->i
zfs
tx->tx
db->db

state.   These processes only get listed in these states when there are problems.   What are these states indicative of?

Eventually things get going again, these states drop off and the system hums along.

Based on some stuff I found in Google (for a person who had a different but somewhat similar problem) I tried setting 

zfs set primarycache=metadata zroot

and

zfs set primarycache=none zroot

but the problem still happened with approximately the same severity and frequency.  (Wanted to see if the system was "churning" with cache upkeep).


What is strange is that this server ran fine for 3 months straight without interruption with the same level of work.

Thanks for any hints or clues
Chad



some data points below

---

# uname -a
FreeBSD newbagend 9.0-STABLE FreeBSD 9.0-STABLE #1: Wed Mar 21 15:22:14 MDT 2012     chad at underhill:/usr/obj/usr/src/sys/UNDERHILL-XEN  amd64
# 

---

# zpool status
 pool: zroot
state: ONLINE
 scan: scrub repaired 0 in 6h13m with 0 errors on Fri Aug 10 19:33:23 2012
config:

	NAME                                            STATE     READ WRITE CKSUM
	zroot                                           ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    gptid/f0da8263-8a52-11e1-b3ae-aa00003efccd  ONLINE       0     0     0
	    gptid/0f24ab58-8a53-11e1-b3ae-aa00003efccd  ONLINE       0     0     0

errors: No known data errors
#

---

representative data from doing a stats during a trouble period

zfs-stats  -a


------------------------------------------------------------------------
ZFS Subsystem Report				Sat Aug 11 13:40:07 2012
------------------------------------------------------------------------

System Information:

	Kernel Version:				900505 (osreldate)
	Hardware Platform:			amd64
	Processor Architecture:			amd64

	ZFS Storage pool Version:		28
	ZFS Filesystem Version:			5

FreeBSD 9.0-STABLE #1: Wed Mar 21 15:22:14 MDT 2012 chad
1:40PM  up  2:54, 3 users, load averages: 0.23, 0.19, 0.14

------------------------------------------------------------------------

System Memory:

	11.49%	681.92	MiB Active,	4.03%	238.97	MiB Inact
	33.37%	1.93	GiB Wired,	0.05%	3.04	MiB Cache
	51.04%	2.96	GiB Free,	0.01%	808.00	KiB Gap

	Real Installed:				6.00	GiB
	Real Available:			99.65%	5.98	GiB
	Real Managed:			96.93%	5.80	GiB

	Logical Total:				6.00	GiB
	Logical Used:			46.76%	2.81	GiB
	Logical Free:			53.24%	3.19	GiB

Kernel Memory:					1.25	GiB
	Data:				98.38%	1.23	GiB
	Text:				1.62%	20.75	MiB

Kernel Memory Map:				5.68	GiB
	Size:				17.27%	1003.75	MiB
	Free:				82.73%	4.70	GiB

------------------------------------------------------------------------

ARC Summary: (HEALTHY)
	Memory Throttle Count:			0

ARC Misc:
	Deleted:				9
	Recycle Misses:				64.30k
	Mutex Misses:				10
	Evict Skips:				58.80k

ARC Size:				39.98%	1.20	GiB
	Target Size: (Adaptive)		100.00%	3.00	GiB
	Min Size (Hard Limit):		12.50%	384.00	MiB
	Max Size (High Water):		8:1	3.00	GiB

ARC Size Breakdown:
	Recently Used Cache Size:	25.56%	785.15	MiB
	Frequently Used Cache Size:	74.44%	2.23	GiB

ARC Hash Breakdown:
	Elements Max:				223.30k
	Elements Current:		99.93%	223.15k
	Collisions:				418.23k
	Chain Max:				9
	Chains:					66.67k

------------------------------------------------------------------------

ARC Efficiency:					3.17m
	Cache Hit Ratio:		89.07%	2.82m
	Cache Miss Ratio:		10.93%	346.27k
	Actual Hit Ratio:		86.49%	2.74m

	Data Demand Efficiency:		99.50%	1.09m
	Data Prefetch Efficiency:	60.54%	1.78k

	CACHE HITS BY CACHE LIST:
	  Most Recently Used:		23.72%	669.34k
	  Most Frequently Used:		73.38%	2.07m
	  Most Recently Used Ghost:	1.92%	54.33k
	  Most Frequently Used Ghost:	3.30%	93.02k

	CACHE HITS BY DATA TYPE:
	  Demand Data:			38.35%	1.08m
	  Prefetch Data:		0.04%	1.08k
	  Demand Metadata:		58.75%	1.66m
	  Prefetch Metadata:		2.87%	80.97k

	CACHE MISSES BY DATA TYPE:
	  Demand Data:			1.56%	5.39k
	  Prefetch Data:		0.20%	704
	  Demand Metadata:		55.46%	192.02k
	  Prefetch Metadata:		42.78%	148.15k

------------------------------------------------------------------------

L2ARC is disabled

------------------------------------------------------------------------

File-Level Prefetch: (HEALTHY)

DMU Efficiency:					6.05m
	Hit Ratio:			66.59%	4.03m
	Miss Ratio:			33.41%	2.02m

	Colinear:				2.02m
	  Hit Ratio:			0.04%	725
	  Miss Ratio:			99.96%	2.02m

	Stride:					3.90m
	  Hit Ratio:			99.98%	3.90m
	  Miss Ratio:			0.02%	826

DMU Misc:
	Reclaim:				2.02m
	  Successes:			2.02%	40.86k
	  Failures:			97.98%	1.98m

	Streams:				125.81k
	  +Resets:			0.36%	453
	  -Resets:			99.64%	125.36k
	  Bogus:				0

------------------------------------------------------------------------

VDEV Cache Summary:				530.68k
	Hit Ratio:			15.30%	81.21k
	Miss Ratio:			70.40%	373.57k
	Delegations:			14.30%	75.89k

------------------------------------------------------------------------

ZFS Tunables (sysctl):
	kern.maxusers                           512
	vm.kmem_size                            6222712832
	vm.kmem_size_scale                      1
	vm.kmem_size_min                        0
	vm.kmem_size_max                        329853485875
	vfs.zfs.l2c_only_size                   0
	vfs.zfs.mfu_ghost_data_lsize            91367424
	vfs.zfs.mfu_ghost_metadata_lsize        128350208
	vfs.zfs.mfu_ghost_size                  219717632
	vfs.zfs.mfu_data_lsize                  132299264
	vfs.zfs.mfu_metadata_lsize              20034048
	vfs.zfs.mfu_size                        160949760
	vfs.zfs.mru_ghost_data_lsize            45155328
	vfs.zfs.mru_ghost_metadata_lsize        642998784
	vfs.zfs.mru_ghost_size                  688154112
	vfs.zfs.mru_data_lsize                  347115520
	vfs.zfs.mru_metadata_lsize              10907136
	vfs.zfs.mru_size                        794174976
	vfs.zfs.anon_data_lsize                 0
	vfs.zfs.anon_metadata_lsize             0
	vfs.zfs.anon_size                       29469696
	vfs.zfs.l2arc_norw                      1
	vfs.zfs.l2arc_feed_again                1
	vfs.zfs.l2arc_noprefetch                1
	vfs.zfs.l2arc_feed_min_ms               200
	vfs.zfs.l2arc_feed_secs                 1
	vfs.zfs.l2arc_headroom                  2
	vfs.zfs.l2arc_write_boost               8388608
	vfs.zfs.l2arc_write_max                 8388608
	vfs.zfs.arc_meta_limit                  805306368
	vfs.zfs.arc_meta_used                   805310296
	vfs.zfs.arc_min                         402653184
	vfs.zfs.arc_max                         3221225472
	vfs.zfs.dedup.prefetch                  1
	vfs.zfs.mdcomp_disable                  0
	vfs.zfs.write_limit_override            0
	vfs.zfs.write_limit_inflated            19260174336
	vfs.zfs.write_limit_max                 802507264
	vfs.zfs.write_limit_min                 33554432
	vfs.zfs.write_limit_shift               3
	vfs.zfs.no_write_throttle               0
	vfs.zfs.zfetch.array_rd_sz              1048576
	vfs.zfs.zfetch.block_cap                256
	vfs.zfs.zfetch.min_sec_reap             2
	vfs.zfs.zfetch.max_streams              8
	vfs.zfs.prefetch_disable                0
	vfs.zfs.mg_alloc_failures               8
	vfs.zfs.check_hostid                    1
	vfs.zfs.recover                         0
	vfs.zfs.txg.synctime_ms                 1000
	vfs.zfs.txg.timeout                     5
	vfs.zfs.scrub_limit                     10
	vfs.zfs.vdev.cache.bshift               16
	vfs.zfs.vdev.cache.size                 10485760
	vfs.zfs.vdev.cache.max                  16384
	vfs.zfs.vdev.write_gap_limit            4096
	vfs.zfs.vdev.read_gap_limit             32768
	vfs.zfs.vdev.aggregation_limit          131072
	vfs.zfs.vdev.ramp_rate                  2
	vfs.zfs.vdev.time_shift                 6
	vfs.zfs.vdev.min_pending                4
	vfs.zfs.vdev.max_pending                10
	vfs.zfs.vdev.bio_flush_disable          0
	vfs.zfs.cache_flush_disable             0
	vfs.zfs.zil_replay_disable              0
	vfs.zfs.zio.use_uma                     0
	vfs.zfs.snapshot_list_prefetch          0
	vfs.zfs.version.zpl                     5
	vfs.zfs.version.spa                     28
	vfs.zfs.version.acl                     1
	vfs.zfs.debug                           0
	vfs.zfs.super_owner                     0

------------------------


representative (from during a trouble period -- you see not much is going on -- low load and the iostat during a calm good period is about the same)

zpool iostat zroot 1


              capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----

zroot        107G  41.9G      7    261  23.8K  1.52M
zroot        107G  41.9G     10    140  7.42K   272K
zroot        107G  41.9G      8    176  14.4K   547K
zroot        107G  41.9G      0     59      0   188K
zroot        107G  41.9G      5    171  6.44K  1.73M
zroot        107G  41.9G      4    284  8.42K  1006K
zroot        107G  41.9G      5    118  2.97K   260K
zroot        107G  41.9G     25    194  27.7K   623K
zroot        107G  41.9G      0    132      0   764K
zroot        107G  41.9G      1     95  6.44K  1.16M
zroot        107G  41.9G      8    272  16.3K   829K
zroot        107G  41.9G     56    212   103K   213K
zroot        107G  41.9G     22    221  27.7K   204K
zroot        107G  41.9G      2    455  1.48K   509K
zroot        107G  41.9G     14    198  7.42K   132K
zroot        107G  41.9G     14    270  7.42K   306K
zroot        107G  41.9G      6    273  3.46K   670K
zroot        107G  41.9G     21    175  10.9K   570K
zroot        107G  41.9G     17    179  8.91K   591K
zroot        107G  41.9G     11    289  17.3K   902K
zroot        107G  41.9G     13    121  6.93K   230K
zroot        107G  41.9G     18    238  9.41K   734K
zroot        107G  41.9G     99     61  50.5K   188K
zroot        107G  41.9G      0    222      0   862K
zroot        107G  41.9G     11    149  13.4K  1.12M
zroot        107G  41.9G     15    319  10.9K  1.05M
zroot        107G  41.9G      0    127      0   392K
zroot        107G  41.9G      0    159      0  1.70M
zroot        107G  41.9G     68    196   212K   601K
zroot        107G  41.9G     17    144  18.8K   295K
zroot        107G  41.9G     12    187  17.3K   588K
zroot        107G  41.9G      0    136      0  1.23M
zroot        107G  41.9G      6    209  23.8K   564K
zroot        107G  41.9G     11    199  12.4K   422K
zroot        107G  41.9G     12    178  9.41K   553K
zroot        107G  41.9G      0    140  1.48K  1.17M
zroot        107G  41.9G     48    200   128K   411K
zroot        107G  41.9G      8    191  16.8K   121K
zroot        107G  41.9G      1    397   1013   375K
zroot        107G  41.9G      0    263      0   132K
zroot        107G  41.9G     14    228  13.4K   235K
zroot        107G  41.9G      7     21  4.46K  10.9K
zroot        107G  41.9G      2    161  1.48K   156K




More information about the freebsd-questions mailing list