advice needed: zpool of 10 x (raidz2 on (4+2) x 2T HDD)
InterNetX - Juergen Gotteswinter
jg at internetx.com
Wed Dec 2 11:45:42 UTC 2015
Hi,
2 things i whould consider suspicious. probably 3
SATA Disks on SAS Controller, Dedup and probably the HBA Firmware Version
Am 02.12.2015 um 12:34 schrieb Zeus Panchenko:
> greetings,
>
> we deployed storage, and as it was filling until now, I see I need
> an advice regarding the configuration and optimization/s ...
>
> the main cause I decided to ask for an advice is this:
>
> once per month (or even more frequently, depends on the load I
> suggest) host hangs and only power reset helps, nothing helpful in log
> files though ... just the fact of restart logged and usual ctld activity
>
> after reboot, `zpool import' lasts 40min and more, and during this time
> no resource of the host is used much ... neither CPU nor memory ... top
> and systat shows no load (I need to export pool first since I need to
> attach geli first, and if I attach geli with zpool still imported, I
> receive in the end a lot of "absent/damaged" disks in zpool which
> disappears after export/import)
>
>
> so, I'm wondering what can I do to trace the cause of hangs? what to monitore to
> understand what to expect and how to prevent ...
>
>
> so, please, advise
>
>
>
> ----------------------------------------------------------------------------------
> bellow the details are:
> ----------------------------------------------------------------------------------
>
> the box is Supermicro X9DRD-7LN4F with:
>
> CPU: Intel(R) Xeon(R) CPU E5-2630L (2 package(s) x 6 core(s) x 2 SMT threads)
> RAM: 128Gb
> STOR: 3 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (jbod)
> 60 x HDD 2T (ATA WDC WD20EFRX-68A 0A80, Fixed Direct Access SCSI-6 device 600.000MB/s)
>
> OS: FreeBSD 10.1-RELEASE #0 r274401 amd64
>
> to avoid OS memory shortage sysctl vfs.zfs.arc_max is set to 120275861504
>
> to clients, storage is provided via iSCSI by ctld (each target is file backed)
>
> zpool created of 10 x raidz2, each raidz2 consists of 6 geli devices and
> now looks so (yes, deduplication is on):
>
>> zpool list storage
> NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT
> storage 109T 33.5T 75.2T - - 30% 1.57x ONLINE -
>
>
>> zpool history storage
> 2013-10-21.01:31:14 zpool create storage
> raidz2 gpt/c0s00 gpt/c0s01 gpt/c1s00 gpt/c1s01 gpt/c2s00 gpt/c2s01
> raidz2 gpt/c0s02 gpt/c0s03 gpt/c1s02 gpt/c1s03 gpt/c2s02 gpt/c2s03
> ...
> raidz2 gpt/c0s18 gpt/c0s19 gpt/c1s18 gpt/c1s19 gpt/c2s18 gpt/c2s19
> log mirror gpt/log0 gpt/log1
> cache gpt/cache0 gpt/cache1
>
>
>> zdb storage
> Cached configuration:
> version: 5000
> name: 'storage'
> state: 0
> txg: 13340514
> pool_guid: 11994995707440773547
> hostid: 1519855013
> hostname: 'storage.foo.bar'
> vdev_children: 11
> vdev_tree:
> type: 'root'
> id: 0
> guid: 11994995707440773547
> children[0]:
> type: 'raidz'
> id: 0
> guid: 12290021428260525074
> nparity: 2
> metaslab_array: 46
> metaslab_shift: 36
> ashift: 12
> asize: 12002364751872
> is_log: 0
> create_txg: 4
> children[0]:
> type: 'disk'
> id: 0
> guid: 3897093815971447961
> path: '/dev/gpt/c0s00'
> phys_path: '/dev/gpt/c0s00'
> whole_disk: 1
> DTL: 9133
> create_txg: 4
> children[1]:
> type: 'disk'
> id: 1
> guid: 1036685341766239763
> path: '/dev/gpt/c0s01'
> phys_path: '/dev/gpt/c0s01'
> whole_disk: 1
> DTL: 9132
> create_txg: 4
> ...
>
>
> each geli is created on one HDD
>> geli list da50.eli
> Geom name: da50.eli
> State: ACTIVE
> EncryptionAlgorithm: AES-XTS
> KeyLength: 256
> Crypto: hardware
> Version: 6
> UsedKey: 0
> Flags: (null)
> KeysAllocated: 466
> KeysTotal: 466
> Providers:
> 1. Name: da50.eli
> Mediasize: 2000398929920 (1.8T)
> Sectorsize: 4096
> Mode: r1w1e3
> Consumers:
> 1. Name: da50
> Mediasize: 2000398934016 (1.8T)
> Sectorsize: 512
> Stripesize: 4096
> Stripeoffset: 0
> Mode: r1w1e1
>
>
>
> each raidz2 disk configured as:
>> gpart show da50.eli
> => 6 488378634 da50.eli GPT (1.8T)
> 6 488378634 1 freebsd-zfs (1.8T)
>
>
>> zfs-stats -a
> --------------------------------------------------------------------------
> ZFS Subsystem Report Wed Dec 2 09:59:27 2015
> --------------------------------------------------------------------------
> System Information:
>
> Kernel Version: 1001000 (osreldate)
> Hardware Platform: amd64
> Processor Architecture: amd64
>
> FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014 root
> 9:59AM up 1 day, 46 mins, 10 users, load averages: 1.03, 0.46, 0.75
> --------------------------------------------------------------------------
> System Memory Statistics:
> Physical Memory: 131012.88M
> Kernel Memory: 1915.37M
> DATA: 98.62% 1888.90M
> TEXT: 1.38% 26.47M
> --------------------------------------------------------------------------
> ZFS pool information:
> Storage pool Version (spa): 5000
> Filesystem Version (zpl): 5
> --------------------------------------------------------------------------
> ARC Misc:
> Deleted: 1961248
> Recycle Misses: 127014
> Mutex Misses: 5973
> Evict Skips: 5973
>
> ARC Size:
> Current Size (arcsize): 100.00% 114703.88M
> Target Size (Adaptive, c): 100.00% 114704.00M
> Min Size (Hard Limit, c_min): 12.50% 14338.00M
> Max Size (High Water, c_max): ~8:1 114704.00M
>
> ARC Size Breakdown:
> Recently Used Cache Size (p): 93.75% 107535.69M
> Freq. Used Cache Size (c-p): 6.25% 7168.31M
>
> ARC Hash Breakdown:
> Elements Max: 6746532
> Elements Current: 100.00% 6746313
> Collisions: 9651654
> Chain Max: 0
> Chains: 1050203
>
> ARC Eviction Statistics:
> Evicts Total: 194298918912
> Evicts Eligible for L2: 81.00% 157373345280
> Evicts Ineligible for L2: 19.00% 36925573632
> Evicts Cached to L2: 97939090944
>
> ARC Efficiency
> Cache Access Total: 109810376
> Cache Hit Ratio: 91.57% 100555148
> Cache Miss Ratio: 8.43% 9255228
> Actual Hit Ratio: 90.54% 99423922
>
> Data Demand Efficiency: 76.64%
> Data Prefetch Efficiency: 48.46%
>
> CACHE HITS BY CACHE LIST:
> Anonymously Used: 0.88% 881966
> Most Recently Used (mru): 23.11% 23236902
> Most Frequently Used (mfu): 75.77% 76187020
> MRU Ghost (mru_ghost): 0.03% 26449
> MFU Ghost (mfu_ghost): 0.22% 222811
>
> CACHE HITS BY DATA TYPE:
> Demand Data: 10.17% 10227867
> Prefetch Data: 0.45% 455126
> Demand Metadata: 88.69% 89184329
> Prefetch Metadata: 0.68% 687826
>
> CACHE MISSES BY DATA TYPE:
> Demand Data: 33.69% 3117808
> Prefetch Data: 5.23% 484140
> Demand Metadata: 56.55% 5233984
> Prefetch Metadata: 4.53% 419296
> --------------------------------------------------------------------------
> L2 ARC Summary:
> Low Memory Aborts: 77
> R/W Clashes: 13
> Free on Write: 523
>
> L2 ARC Size:
> Current Size: (Adaptive) 91988.13M
> Header Size: 0.13% 120.08M
>
> L2 ARC Read/Write Activity:
> Bytes Written: 97783.99M
> Bytes Read: 2464.81M
>
> L2 ARC Breakdown:
> Access Total: 8110124
> Hit Ratio: 2.89% 234616
> Miss Ratio: 97.11% 7875508
> Feeds: 85129
>
> WRITES:
> Sent Total: 100.00% 18448
> --------------------------------------------------------------------------
> VDEV Cache Summary:
> Access Total: 0
> Hits Ratio: 0.00% 0
> Miss Ratio: 0.00% 0
> Delegations: 0
> --------------------------------------------------------------------------
> File-Level Prefetch Stats (DMU):
>
> DMU Efficiency:
> Access Total: 162279162
> Hit Ratio: 91.69% 148788486
> Miss Ratio: 8.31% 13490676
>
> Colinear Access Total: 13490676
> Colinear Hit Ratio: 0.06% 8166
> Colinear Miss Ratio: 99.94% 13482510
>
> Stride Access Total: 146863482
> Stride Hit Ratio: 99.31% 145846806
> Stride Miss Ratio: 0.69% 1016676
>
> DMU misc:
> Reclaim successes: 124372
> Reclaim failures: 13358138
> Stream resets: 618
> Stream noresets: 2938602
> Bogus streams: 0
> --------------------------------------------------------------------------
> ZFS Tunable (sysctl):
> kern.maxusers=8524
> vfs.zfs.arc_max=120275861504
> vfs.zfs.arc_min=15034482688
> vfs.zfs.arc_average_blocksize=8192
> vfs.zfs.arc_meta_used=24838283936
> vfs.zfs.arc_meta_limit=30068965376
> vfs.zfs.l2arc_write_max=8388608
> vfs.zfs.l2arc_write_boost=8388608
> vfs.zfs.l2arc_headroom=2
> vfs.zfs.l2arc_feed_secs=1
> vfs.zfs.l2arc_feed_min_ms=200
> vfs.zfs.l2arc_noprefetch=1
> vfs.zfs.l2arc_feed_again=1
> vfs.zfs.l2arc_norw=1
> vfs.zfs.anon_size=27974656
> vfs.zfs.anon_metadata_lsize=0
> vfs.zfs.anon_data_lsize=0
> vfs.zfs.mru_size=112732930560
> vfs.zfs.mru_metadata_lsize=18147921408
> vfs.zfs.mru_data_lsize=92690379776
> vfs.zfs.mru_ghost_size=7542758400
> vfs.zfs.mru_ghost_metadata_lsize=1262705664
> vfs.zfs.mru_ghost_data_lsize=6280052736
> vfs.zfs.mfu_size=3748620800
> vfs.zfs.mfu_metadata_lsize=1014886912
> vfs.zfs.mfu_data_lsize=2723481600
> vfs.zfs.mfu_ghost_size=24582345728
> vfs.zfs.mfu_ghost_metadata_lsize=682512384
> vfs.zfs.mfu_ghost_data_lsize=23899833344
> vfs.zfs.l2c_only_size=66548531200
> vfs.zfs.dedup.prefetch=1
> vfs.zfs.nopwrite_enabled=1
> vfs.zfs.mdcomp_disable=0
> vfs.zfs.dirty_data_max=4294967296
> vfs.zfs.dirty_data_max_max=4294967296
> vfs.zfs.dirty_data_max_percent=10
> vfs.zfs.dirty_data_sync=67108864
> vfs.zfs.delay_min_dirty_percent=60
> vfs.zfs.delay_scale=500000
> vfs.zfs.prefetch_disable=0
> vfs.zfs.zfetch.max_streams=8
> vfs.zfs.zfetch.min_sec_reap=2
> vfs.zfs.zfetch.block_cap=256
> vfs.zfs.zfetch.array_rd_sz=1048576
> vfs.zfs.top_maxinflight=32
> vfs.zfs.resilver_delay=2
> vfs.zfs.scrub_delay=4
> vfs.zfs.scan_idle=50
> vfs.zfs.scan_min_time_ms=1000
> vfs.zfs.free_min_time_ms=1000
> vfs.zfs.resilver_min_time_ms=3000
> vfs.zfs.no_scrub_io=0
> vfs.zfs.no_scrub_prefetch=0
> vfs.zfs.metaslab.gang_bang=131073
> vfs.zfs.metaslab.fragmentation_threshold=70
> vfs.zfs.metaslab.debug_load=0
> vfs.zfs.metaslab.debug_unload=0
> vfs.zfs.metaslab.df_alloc_threshold=131072
> vfs.zfs.metaslab.df_free_pct=4
> vfs.zfs.metaslab.min_alloc_size=10485760
> vfs.zfs.metaslab.load_pct=50
> vfs.zfs.metaslab.unload_delay=8
> vfs.zfs.metaslab.preload_limit=3
> vfs.zfs.metaslab.preload_enabled=1
> vfs.zfs.metaslab.fragmentation_factor_enabled=1
> vfs.zfs.metaslab.lba_weighting_enabled=1
> vfs.zfs.metaslab.bias_enabled=1
> vfs.zfs.condense_pct=200
> vfs.zfs.mg_noalloc_threshold=0
> vfs.zfs.mg_fragmentation_threshold=85
> vfs.zfs.check_hostid=1
> vfs.zfs.spa_load_verify_maxinflight=10000
> vfs.zfs.spa_load_verify_metadata=1
> vfs.zfs.spa_load_verify_data=1
> vfs.zfs.recover=0
> vfs.zfs.deadman_synctime_ms=1000000
> vfs.zfs.deadman_checktime_ms=5000
> vfs.zfs.deadman_enabled=1
> vfs.zfs.spa_asize_inflation=24
> vfs.zfs.txg.timeout=5
> vfs.zfs.vdev.cache.max=16384
> vfs.zfs.vdev.cache.size=0
> vfs.zfs.vdev.cache.bshift=16
> vfs.zfs.vdev.trim_on_init=1
> vfs.zfs.vdev.mirror.rotating_inc=0
> vfs.zfs.vdev.mirror.rotating_seek_inc=5
> vfs.zfs.vdev.mirror.rotating_seek_offset=1048576
> vfs.zfs.vdev.mirror.non_rotating_inc=0
> vfs.zfs.vdev.mirror.non_rotating_seek_inc=1
> vfs.zfs.vdev.max_active=1000
> vfs.zfs.vdev.sync_read_min_active=10
> vfs.zfs.vdev.sync_read_max_active=10
> vfs.zfs.vdev.sync_write_min_active=10
> vfs.zfs.vdev.sync_write_max_active=10
> vfs.zfs.vdev.async_read_min_active=1
> vfs.zfs.vdev.async_read_max_active=3
> vfs.zfs.vdev.async_write_min_active=1
> vfs.zfs.vdev.async_write_max_active=10
> vfs.zfs.vdev.scrub_min_active=1
> vfs.zfs.vdev.scrub_max_active=2
> vfs.zfs.vdev.trim_min_active=1
> vfs.zfs.vdev.trim_max_active=64
> vfs.zfs.vdev.aggregation_limit=131072
> vfs.zfs.vdev.read_gap_limit=32768
> vfs.zfs.vdev.write_gap_limit=4096
> vfs.zfs.vdev.bio_flush_disable=0
> vfs.zfs.vdev.bio_delete_disable=0
> vfs.zfs.vdev.trim_max_bytes=2147483648
> vfs.zfs.vdev.trim_max_pending=64
> vfs.zfs.max_auto_ashift=13
> vfs.zfs.min_auto_ashift=9
> vfs.zfs.zil_replay_disable=0
> vfs.zfs.cache_flush_disable=0
> vfs.zfs.zio.use_uma=1
> vfs.zfs.zio.exclude_metadata=0
> vfs.zfs.sync_pass_deferred_free=2
> vfs.zfs.sync_pass_dont_compress=5
> vfs.zfs.sync_pass_rewrite=2
> vfs.zfs.snapshot_list_prefetch=0
> vfs.zfs.super_owner=0
> vfs.zfs.debug=0
> vfs.zfs.version.ioctl=4
> vfs.zfs.version.acl=1
> vfs.zfs.version.spa=5000
> vfs.zfs.version.zpl=5
> vfs.zfs.vol.mode=1
> vfs.zfs.trim.enabled=1
> vfs.zfs.trim.txg_delay=32
> vfs.zfs.trim.timeout=30
> vfs.zfs.trim.max_interval=1
> vm.kmem_size=133823901696
> vm.kmem_size_scale=1
> vm.kmem_size_min=0
> vm.kmem_size_max=1319413950874
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>
More information about the freebsd-fs
mailing list