advice needed: zpool of 10 x (raidz2 on (4+2) x 2T HDD)

InterNetX - Juergen Gotteswinter jg at internetx.com
Wed Dec 2 11:45:42 UTC 2015


Hi,

2 things i whould consider suspicious. probably 3

SATA Disks on SAS Controller, Dedup and probably the HBA Firmware Version




Am 02.12.2015 um 12:34 schrieb Zeus Panchenko:
> greetings,
> 
> we deployed storage, and as it was filling until now, I see I need
> an advice regarding the configuration and optimization/s ...
> 
> the main cause I decided to ask for an advice is this:
> 
> once per month (or even more frequently, depends on the load I
> suggest) host hangs and only power reset helps, nothing helpful in log
> files though ... just the fact of restart logged and usual ctld activity
> 
> after reboot, `zpool import' lasts 40min and more, and during this time
> no resource of the host is used much ... neither CPU nor memory ... top
> and systat shows no load (I need to export pool first since I need to
> attach geli first, and if I attach geli with zpool still imported, I
> receive in the end a lot of "absent/damaged" disks in zpool which
> disappears after export/import)
> 
> 
> so, I'm wondering what can I do to trace the cause of hangs? what to monitore to
> understand what to expect and how to prevent ... 
> 
> 
> so, please, advise
> 
> 
> 
> ----------------------------------------------------------------------------------
> bellow the details are:
> ----------------------------------------------------------------------------------
> 
> the box is Supermicro X9DRD-7LN4F with:
> 
>   CPU: Intel(R) Xeon(R) CPU E5-2630L (2 package(s) x 6 core(s) x 2 SMT threads)
>   RAM: 128Gb
>  STOR: 3 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (jbod)
>        60 x HDD 2T (ATA WDC WD20EFRX-68A 0A80, Fixed Direct Access SCSI-6 device 600.000MB/s)
> 
> OS: FreeBSD 10.1-RELEASE #0 r274401 amd64
> 
> to avoid OS memory shortage sysctl vfs.zfs.arc_max is set to 120275861504
> 
> to clients, storage is provided via iSCSI by ctld (each target is file backed)
> 
> zpool created of 10 x raidz2, each raidz2 consists of 6 geli devices and
> now looks so (yes, deduplication is on):
> 
>> zpool list storage
> NAME            SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
> storage         109T  33.5T  75.2T      -         -    30%  1.57x  ONLINE  -
> 
> 
>> zpool history storage
> 2013-10-21.01:31:14 zpool create storage 
>   raidz2 gpt/c0s00 gpt/c0s01 gpt/c1s00 gpt/c1s01 gpt/c2s00 gpt/c2s01
>   raidz2 gpt/c0s02 gpt/c0s03 gpt/c1s02 gpt/c1s03 gpt/c2s02 gpt/c2s03
>   ...
>   raidz2 gpt/c0s18 gpt/c0s19 gpt/c1s18 gpt/c1s19 gpt/c2s18 gpt/c2s19
>  log mirror gpt/log0 gpt/log1
>  cache gpt/cache0 gpt/cache1
> 
> 
>> zdb storage
> Cached configuration:
>         version: 5000
>         name: 'storage'
>         state: 0
>         txg: 13340514
>         pool_guid: 11994995707440773547
>         hostid: 1519855013
>         hostname: 'storage.foo.bar'
>         vdev_children: 11
>         vdev_tree:
>             type: 'root'
>             id: 0
>             guid: 11994995707440773547
>             children[0]:
>                 type: 'raidz'
>                 id: 0
>                 guid: 12290021428260525074
>                 nparity: 2
>                 metaslab_array: 46
>                 metaslab_shift: 36
>                 ashift: 12
>                 asize: 12002364751872
>                 is_log: 0
>                 create_txg: 4
>                 children[0]:
>                     type: 'disk'
>                     id: 0
>                     guid: 3897093815971447961
>                     path: '/dev/gpt/c0s00'
>                     phys_path: '/dev/gpt/c0s00'
>                     whole_disk: 1
>                     DTL: 9133
>                     create_txg: 4
>                 children[1]:
>                     type: 'disk'
>                     id: 1
>                     guid: 1036685341766239763
>                     path: '/dev/gpt/c0s01'
>                     phys_path: '/dev/gpt/c0s01'
>                     whole_disk: 1
>                     DTL: 9132
>                     create_txg: 4
> 		    ...
> 
> 
> each geli is created on one HDD
>> geli list da50.eli
> Geom name: da50.eli
> State: ACTIVE
> EncryptionAlgorithm: AES-XTS
> KeyLength: 256
> Crypto: hardware
> Version: 6
> UsedKey: 0
> Flags: (null)
> KeysAllocated: 466
> KeysTotal: 466
> Providers:
> 1. Name: da50.eli
>    Mediasize: 2000398929920 (1.8T)
>    Sectorsize: 4096
>    Mode: r1w1e3
> Consumers:
> 1. Name: da50
>    Mediasize: 2000398934016 (1.8T)
>    Sectorsize: 512
>    Stripesize: 4096
>    Stripeoffset: 0
>    Mode: r1w1e1
> 
> 
> 
> each raidz2 disk configured as:
>> gpart show da50.eli     
> =>        6  488378634  da50.eli  GPT  (1.8T)
>           6  488378634         1  freebsd-zfs  (1.8T)
> 
> 
>> zfs-stats -a
> --------------------------------------------------------------------------
> ZFS Subsystem Report				Wed Dec  2 09:59:27 2015
> --------------------------------------------------------------------------
> System Information:
> 
> 	Kernel Version:				1001000 (osreldate)
> 	Hardware Platform:			amd64
> 	Processor Architecture:			amd64
> 
> FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014     root
>  9:59AM  up 1 day, 46 mins, 10 users, load averages: 1.03, 0.46, 0.75
> --------------------------------------------------------------------------
> System Memory Statistics:
> 	Physical Memory:			131012.88M
> 	Kernel Memory:				1915.37M
> 	DATA:				98.62%	1888.90M
> 	TEXT:				1.38%	26.47M
> --------------------------------------------------------------------------
> ZFS pool information:
> 	Storage pool Version (spa):		5000
> 	Filesystem Version (zpl):		5
> --------------------------------------------------------------------------
> ARC Misc:
> 	Deleted:				1961248
> 	Recycle Misses:				127014
> 	Mutex Misses:				5973
> 	Evict Skips:				5973
> 
> ARC Size:
> 	Current Size (arcsize):		100.00%	114703.88M
> 	Target Size (Adaptive, c):	100.00%	114704.00M
> 	Min Size (Hard Limit, c_min):	12.50%	14338.00M
> 	Max Size (High Water, c_max):	~8:1	114704.00M
> 
> ARC Size Breakdown:
> 	Recently Used Cache Size (p):	93.75%	107535.69M
> 	Freq. Used Cache Size (c-p):	6.25%	7168.31M
> 
> ARC Hash Breakdown:
> 	Elements Max:				6746532
> 	Elements Current:		100.00%	6746313
> 	Collisions:				9651654
> 	Chain Max:				0
> 	Chains:					1050203
> 
> ARC Eviction Statistics:
> 	Evicts Total:				194298918912
> 	Evicts Eligible for L2:		81.00%	157373345280
> 	Evicts Ineligible for L2:	19.00%	36925573632
> 	Evicts Cached to L2:			97939090944
> 
> ARC Efficiency
> 	Cache Access Total:			109810376
> 	Cache Hit Ratio:		91.57%	100555148
> 	Cache Miss Ratio:		8.43%	9255228
> 	Actual Hit Ratio:		90.54%	99423922
> 
> 	Data Demand Efficiency:		76.64%
> 	Data Prefetch Efficiency:	48.46%
> 
> 	CACHE HITS BY CACHE LIST:
> 	  Anonymously Used:		0.88%	881966
> 	  Most Recently Used (mru):	23.11%	23236902
> 	  Most Frequently Used (mfu):	75.77%	76187020
> 	  MRU Ghost (mru_ghost):	0.03%	26449
> 	  MFU Ghost (mfu_ghost):	0.22%	222811
> 
> 	CACHE HITS BY DATA TYPE:
> 	  Demand Data:			10.17%	10227867
> 	  Prefetch Data:		0.45%	455126
> 	  Demand Metadata:		88.69%	89184329
> 	  Prefetch Metadata:		0.68%	687826
> 
> 	CACHE MISSES BY DATA TYPE:
> 	  Demand Data:			33.69%	3117808
> 	  Prefetch Data:		5.23%	484140
> 	  Demand Metadata:		56.55%	5233984
> 	  Prefetch Metadata:		4.53%	419296
> --------------------------------------------------------------------------
> L2 ARC Summary:
> 	Low Memory Aborts:			77
> 	R/W Clashes:				13
> 	Free on Write:				523
> 
> L2 ARC Size:
> 	Current Size: (Adaptive)		91988.13M
> 	Header Size:			0.13%	120.08M
> 
> L2 ARC Read/Write Activity:
> 	Bytes Written:				97783.99M
> 	Bytes Read:				2464.81M
> 
> L2 ARC Breakdown:
> 	Access Total:				8110124
> 	Hit Ratio:			2.89%	234616
> 	Miss Ratio:			97.11%	7875508
> 	Feeds:					85129
> 
> 	WRITES:
> 	  Sent Total:			100.00%	18448
> --------------------------------------------------------------------------
> VDEV Cache Summary:
> 	Access Total:				0
> 	Hits Ratio:			0.00%	0
> 	Miss Ratio:			0.00%	0
> 	Delegations:				0
> --------------------------------------------------------------------------
> File-Level Prefetch Stats (DMU):
> 
> DMU Efficiency:
> 	Access Total:				162279162
> 	Hit Ratio:			91.69%	148788486
> 	Miss Ratio:			8.31%	13490676
> 
> 	Colinear Access Total:			13490676
> 	Colinear Hit Ratio:		0.06%	8166
> 	Colinear Miss Ratio:		99.94%	13482510
> 
> 	Stride Access Total:			146863482
> 	Stride Hit Ratio:		99.31%	145846806
> 	Stride Miss Ratio:		0.69%	1016676
> 
> DMU misc:
> 	Reclaim successes:			124372
> 	Reclaim failures:			13358138
> 	Stream resets:				618
> 	Stream noresets:			2938602
> 	Bogus streams:				0
> --------------------------------------------------------------------------
> ZFS Tunable (sysctl):
> 	kern.maxusers=8524
> 	vfs.zfs.arc_max=120275861504
> 	vfs.zfs.arc_min=15034482688
> 	vfs.zfs.arc_average_blocksize=8192
> 	vfs.zfs.arc_meta_used=24838283936
> 	vfs.zfs.arc_meta_limit=30068965376
> 	vfs.zfs.l2arc_write_max=8388608
> 	vfs.zfs.l2arc_write_boost=8388608
> 	vfs.zfs.l2arc_headroom=2
> 	vfs.zfs.l2arc_feed_secs=1
> 	vfs.zfs.l2arc_feed_min_ms=200
> 	vfs.zfs.l2arc_noprefetch=1
> 	vfs.zfs.l2arc_feed_again=1
> 	vfs.zfs.l2arc_norw=1
> 	vfs.zfs.anon_size=27974656
> 	vfs.zfs.anon_metadata_lsize=0
> 	vfs.zfs.anon_data_lsize=0
> 	vfs.zfs.mru_size=112732930560
> 	vfs.zfs.mru_metadata_lsize=18147921408
> 	vfs.zfs.mru_data_lsize=92690379776
> 	vfs.zfs.mru_ghost_size=7542758400
> 	vfs.zfs.mru_ghost_metadata_lsize=1262705664
> 	vfs.zfs.mru_ghost_data_lsize=6280052736
> 	vfs.zfs.mfu_size=3748620800
> 	vfs.zfs.mfu_metadata_lsize=1014886912
> 	vfs.zfs.mfu_data_lsize=2723481600
> 	vfs.zfs.mfu_ghost_size=24582345728
> 	vfs.zfs.mfu_ghost_metadata_lsize=682512384
> 	vfs.zfs.mfu_ghost_data_lsize=23899833344
> 	vfs.zfs.l2c_only_size=66548531200
> 	vfs.zfs.dedup.prefetch=1
> 	vfs.zfs.nopwrite_enabled=1
> 	vfs.zfs.mdcomp_disable=0
> 	vfs.zfs.dirty_data_max=4294967296
> 	vfs.zfs.dirty_data_max_max=4294967296
> 	vfs.zfs.dirty_data_max_percent=10
> 	vfs.zfs.dirty_data_sync=67108864
> 	vfs.zfs.delay_min_dirty_percent=60
> 	vfs.zfs.delay_scale=500000
> 	vfs.zfs.prefetch_disable=0
> 	vfs.zfs.zfetch.max_streams=8
> 	vfs.zfs.zfetch.min_sec_reap=2
> 	vfs.zfs.zfetch.block_cap=256
> 	vfs.zfs.zfetch.array_rd_sz=1048576
> 	vfs.zfs.top_maxinflight=32
> 	vfs.zfs.resilver_delay=2
> 	vfs.zfs.scrub_delay=4
> 	vfs.zfs.scan_idle=50
> 	vfs.zfs.scan_min_time_ms=1000
> 	vfs.zfs.free_min_time_ms=1000
> 	vfs.zfs.resilver_min_time_ms=3000
> 	vfs.zfs.no_scrub_io=0
> 	vfs.zfs.no_scrub_prefetch=0
> 	vfs.zfs.metaslab.gang_bang=131073
> 	vfs.zfs.metaslab.fragmentation_threshold=70
> 	vfs.zfs.metaslab.debug_load=0
> 	vfs.zfs.metaslab.debug_unload=0
> 	vfs.zfs.metaslab.df_alloc_threshold=131072
> 	vfs.zfs.metaslab.df_free_pct=4
> 	vfs.zfs.metaslab.min_alloc_size=10485760
> 	vfs.zfs.metaslab.load_pct=50
> 	vfs.zfs.metaslab.unload_delay=8
> 	vfs.zfs.metaslab.preload_limit=3
> 	vfs.zfs.metaslab.preload_enabled=1
> 	vfs.zfs.metaslab.fragmentation_factor_enabled=1
> 	vfs.zfs.metaslab.lba_weighting_enabled=1
> 	vfs.zfs.metaslab.bias_enabled=1
> 	vfs.zfs.condense_pct=200
> 	vfs.zfs.mg_noalloc_threshold=0
> 	vfs.zfs.mg_fragmentation_threshold=85
> 	vfs.zfs.check_hostid=1
> 	vfs.zfs.spa_load_verify_maxinflight=10000
> 	vfs.zfs.spa_load_verify_metadata=1
> 	vfs.zfs.spa_load_verify_data=1
> 	vfs.zfs.recover=0
> 	vfs.zfs.deadman_synctime_ms=1000000
> 	vfs.zfs.deadman_checktime_ms=5000
> 	vfs.zfs.deadman_enabled=1
> 	vfs.zfs.spa_asize_inflation=24
> 	vfs.zfs.txg.timeout=5
> 	vfs.zfs.vdev.cache.max=16384
> 	vfs.zfs.vdev.cache.size=0
> 	vfs.zfs.vdev.cache.bshift=16
> 	vfs.zfs.vdev.trim_on_init=1
> 	vfs.zfs.vdev.mirror.rotating_inc=0
> 	vfs.zfs.vdev.mirror.rotating_seek_inc=5
> 	vfs.zfs.vdev.mirror.rotating_seek_offset=1048576
> 	vfs.zfs.vdev.mirror.non_rotating_inc=0
> 	vfs.zfs.vdev.mirror.non_rotating_seek_inc=1
> 	vfs.zfs.vdev.max_active=1000
> 	vfs.zfs.vdev.sync_read_min_active=10
> 	vfs.zfs.vdev.sync_read_max_active=10
> 	vfs.zfs.vdev.sync_write_min_active=10
> 	vfs.zfs.vdev.sync_write_max_active=10
> 	vfs.zfs.vdev.async_read_min_active=1
> 	vfs.zfs.vdev.async_read_max_active=3
> 	vfs.zfs.vdev.async_write_min_active=1
> 	vfs.zfs.vdev.async_write_max_active=10
> 	vfs.zfs.vdev.scrub_min_active=1
> 	vfs.zfs.vdev.scrub_max_active=2
> 	vfs.zfs.vdev.trim_min_active=1
> 	vfs.zfs.vdev.trim_max_active=64
> 	vfs.zfs.vdev.aggregation_limit=131072
> 	vfs.zfs.vdev.read_gap_limit=32768
> 	vfs.zfs.vdev.write_gap_limit=4096
> 	vfs.zfs.vdev.bio_flush_disable=0
> 	vfs.zfs.vdev.bio_delete_disable=0
> 	vfs.zfs.vdev.trim_max_bytes=2147483648
> 	vfs.zfs.vdev.trim_max_pending=64
> 	vfs.zfs.max_auto_ashift=13
> 	vfs.zfs.min_auto_ashift=9
> 	vfs.zfs.zil_replay_disable=0
> 	vfs.zfs.cache_flush_disable=0
> 	vfs.zfs.zio.use_uma=1
> 	vfs.zfs.zio.exclude_metadata=0
> 	vfs.zfs.sync_pass_deferred_free=2
> 	vfs.zfs.sync_pass_dont_compress=5
> 	vfs.zfs.sync_pass_rewrite=2
> 	vfs.zfs.snapshot_list_prefetch=0
> 	vfs.zfs.super_owner=0
> 	vfs.zfs.debug=0
> 	vfs.zfs.version.ioctl=4
> 	vfs.zfs.version.acl=1
> 	vfs.zfs.version.spa=5000
> 	vfs.zfs.version.zpl=5
> 	vfs.zfs.vol.mode=1
> 	vfs.zfs.trim.enabled=1
> 	vfs.zfs.trim.txg_delay=32
> 	vfs.zfs.trim.timeout=30
> 	vfs.zfs.trim.max_interval=1
> 	vm.kmem_size=133823901696
> 	vm.kmem_size_scale=1
> 	vm.kmem_size_min=0
> 	vm.kmem_size_max=1319413950874
> 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> 


More information about the freebsd-fs mailing list