Next Steps to Debug ZFS Hang?
Nick Sivo
nick at ycombinator.com
Wed Oct 8 05:08:52 UTC 2014
Thanks Chad. I looked through 20 pages of google search results restricted to the archives of this list, and didn't see anything I could tell was related to the problem I'm having. It's not a complete IO stall, or even a pool-wide stall. It's only for listing directory contents of a single filesystem, which is why I'm baffled.
The pool is less that 40% utilized, ARC isn't under memory pressure, and as far as I can tell everything should be fine.
I did find: https://wiki.freebsd.org/AvgZfsDeadlockDebug which confirms I want kernel stack traces. I was hoping to get some guidance on that, especially on a remote system with only SSH access. Not sure I can just enter DDB over SSH. Maybe tricks with dtrace and stack()?
[nsivo at hn3 sysutils]$ zpool get all ssd | grep -v default
NAME PROPERTY VALUE SOURCE
ssd size 182G -
ssd capacity 38% -
ssd health ONLINE -
ssd failmode panic local
ssd dedupratio 1.00x -
ssd free 111G -
ssd allocated 70.7G -
ssd readonly off -
ssd expandsize 0 -
ssd feature at async_destroy enabled local
ssd feature at empty_bpobj active local
ssd feature at lz4_compress enabled local
ssd unsupported at com.joyent:multi_vdev_crash_dump inactive local
[nsivo at hn3 sysutils]$ zfs get all | grep ssd | grep -v default | grep -v 'arc at 2'
ssd type filesystem -
ssd creation Thu Aug 28 16:33 2014 -
ssd used 70.7G -
ssd available 108G -
ssd referenced 144K -
ssd compressratio 1.00x -
ssd mounted no -
ssd mountpoint none local
ssd checksum sha256 local
ssd atime off local
ssd canmount off local
ssd version 5 -
ssd utf8only on -
ssd normalization formKC -
ssd casesensitivity sensitive -
ssd usedbysnapshots 0 -
ssd usedbydataset 144K -
ssd usedbychildren 70.7G -
ssd usedbyrefreservation 0 -
ssd mlslabel -
ssd refcompressratio 1.00x -
ssd written 144K -
ssd logicalused 31.1G -
ssd logicalreferenced 43.5K -
ssd/arc type filesystem -
ssd/arc creation Wed Sep 17 17:07 2014 -
ssd/arc used 70.5G -
ssd/arc available 108G -
ssd/arc referenced 47.8G -
ssd/arc compressratio 1.00x -
ssd/arc mounted yes -
ssd/arc mountpoint /usr/arc received
ssd/arc checksum sha256 inherited from ssd
ssd/arc atime off inherited from ssd
ssd/arc setuid off received
ssd/arc snapdir visible received
ssd/arc xattr off temporary
ssd/arc version 5 -
ssd/arc utf8only on -
ssd/arc normalization formKC -
ssd/arc casesensitivity sensitive -
ssd/arc usedbysnapshots 22.7G -
ssd/arc usedbydataset 47.8G -
ssd/arc usedbychildren 0 -
ssd/arc usedbyrefreservation 0 -
ssd/arc mlslabel -
ssd/arc sync always local
ssd/arc refcompressratio 1.00x -
ssd/arc written 262M -
ssd/arc logicalused 31.0G -
ssd/arc logicalreferenced 15.3G -
[nsivo at hn3 ~]$ zfs-stats -a
------------------------------------------------------------------------
ZFS Subsystem Report Tue Oct 7 21:58:37 2014
------------------------------------------------------------------------
System Information:
Kernel Version: 902001 (osreldate)
Hardware Platform: amd64
Processor Architecture: amd64
ZFS Storage pool Version: 5000
ZFS Filesystem Version: 5
FreeBSD 9.2-RELEASE-p12 #0: Mon Sep 15 18:46:46 UTC 2014 root
9:58PM up 16 days, 3:39, 2 users, load averages: 0.24, 0.33, 0.35
------------------------------------------------------------------------
System Memory:
16.80% 10.42 GiB Active, 0.15% 94.19 MiB Inact
72.82% 45.15 GiB Wired, 0.12% 74.04 MiB Cache
10.11% 6.27 GiB Free, 0.00% 2.46 MiB Gap
Real Installed: 64.00 GiB
Real Available: 99.91% 63.94 GiB
Real Managed: 96.97% 62.00 GiB
Logical Total: 64.00 GiB
Logical Used: 89.95% 57.56 GiB
Logical Free: 10.05% 6.44 GiB
Kernel Memory: 15.70 GiB
Data: 99.83% 15.67 GiB
Text: 0.17% 27.37 MiB
Kernel Memory Map: 52.92 GiB
Size: 21.21% 11.22 GiB
Free: 78.79% 41.69 GiB
------------------------------------------------------------------------
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 500.41m
Recycle Misses: 270.90m
Mutex Misses: 27.63m
Evict Skips: 5.12b
ARC Size: 25.95% 15.83 GiB
Target Size: (Adaptive) 45.64% 27.84 GiB
Min Size (Hard Limit): 12.50% 7.63 GiB
Max Size (High Water): 8:1 61.00 GiB
ARC Size Breakdown:
Recently Used Cache Size: 10.56% 2.94 GiB
Frequently Used Cache Size: 89.44% 24.91 GiB
ARC Hash Breakdown:
Elements Max: 16.33m
Elements Current: 20.84% 3.40m
Collisions: 571.54m
Chain Max: 41
Chains: 750.10k
------------------------------------------------------------------------
ARC Efficiency: 4.77b
Cache Hit Ratio: 81.44% 3.88b
Cache Miss Ratio: 18.56% 884.45m
Actual Hit Ratio: 80.36% 3.83b
Data Demand Efficiency: 29.50% 966.53m
Data Prefetch Efficiency: 29.08% 21.22m
CACHE HITS BY CACHE LIST:
Most Recently Used: 3.47% 134.64m
Most Frequently Used: 95.20% 3.69b
Most Recently Used Ghost: 5.18% 200.94m
Most Frequently Used Ghost: 5.58% 216.52m
CACHE HITS BY DATA TYPE:
Demand Data: 7.35% 285.16m
Prefetch Data: 0.16% 6.17m
Demand Metadata: 90.91% 3.53b
Prefetch Metadata: 1.58% 61.40m
CACHE MISSES BY DATA TYPE:
Demand Data: 77.04% 681.37m
Prefetch Data: 1.70% 15.05m
Demand Metadata: 15.86% 140.24m
Prefetch Metadata: 5.40% 47.79m
------------------------------------------------------------------------
L2ARC is disabled
------------------------------------------------------------------------
File-Level Prefetch: (HEALTHY)
DMU Efficiency: 11.05b
Hit Ratio: 59.82% 6.61b
Miss Ratio: 40.18% 4.44b
Colinear: 4.44b
Hit Ratio: 0.01% 317.87k
Miss Ratio: 99.99% 4.44b
Stride: 6.62b
Hit Ratio: 99.57% 6.59b
Miss Ratio: 0.43% 28.41m
DMU Misc:
Reclaim: 4.44b
Successes: 0.81% 35.91m
Failures: 99.19% 4.40b
Streams: 16.54m
+Resets: 0.25% 41.20k
-Resets: 99.75% 16.50m
Bogus: 0
------------------------------------------------------------------------
VDEV cache is disabled
------------------------------------------------------------------------
ZFS Tunables (sysctl):
kern.maxusers 384
vm.kmem_size 66575511552
vm.kmem_size_scale 1
vm.kmem_size_min 0
vm.kmem_size_max 329853485875
vfs.zfs.l2c_only_size 0
vfs.zfs.mfu_ghost_data_lsize 483016704
vfs.zfs.mfu_ghost_metadata_lsize 1192360960
vfs.zfs.mfu_ghost_size 1675377664
vfs.zfs.mfu_data_lsize 3821800448
vfs.zfs.mfu_metadata_lsize 1144714240
vfs.zfs.mfu_size 9304926208
vfs.zfs.mru_ghost_data_lsize 7731420672
vfs.zfs.mru_ghost_metadata_lsize 19021883392
vfs.zfs.mru_ghost_size 26753304064
vfs.zfs.mru_data_lsize 1530433536
vfs.zfs.mru_metadata_lsize 390488064
vfs.zfs.mru_size 3144110080
vfs.zfs.anon_data_lsize 0
vfs.zfs.anon_metadata_lsize 0
vfs.zfs.anon_size 12001792
vfs.zfs.l2arc_norw 1
vfs.zfs.l2arc_feed_again 1
vfs.zfs.l2arc_noprefetch 1
vfs.zfs.l2arc_feed_min_ms 200
vfs.zfs.l2arc_feed_secs 1
vfs.zfs.l2arc_headroom 2
vfs.zfs.l2arc_write_boost 8388608
vfs.zfs.l2arc_write_max 8388608
vfs.zfs.arc_meta_limit 16375442432
vfs.zfs.arc_meta_used 11641238168
vfs.zfs.arc_min 8187721216
vfs.zfs.arc_max 65501769728
vfs.zfs.dedup.prefetch 1
vfs.zfs.mdcomp_disable 0
vfs.zfs.nopwrite_enabled 1
vfs.zfs.write_limit_override 0
vfs.zfs.write_limit_inflated 205969035264
vfs.zfs.write_limit_max 8582043136
vfs.zfs.write_limit_min 33554432
vfs.zfs.write_limit_shift 3
vfs.zfs.no_write_throttle 0
vfs.zfs.zfetch.array_rd_sz 1048576
vfs.zfs.zfetch.block_cap 256
vfs.zfs.zfetch.min_sec_reap 2
vfs.zfs.zfetch.max_streams 8
vfs.zfs.prefetch_disable 0
vfs.zfs.no_scrub_prefetch 0
vfs.zfs.no_scrub_io 0
vfs.zfs.resilver_min_time_ms 3000
vfs.zfs.free_min_time_ms 1000
vfs.zfs.scan_min_time_ms 1000
vfs.zfs.scan_idle 50
vfs.zfs.scrub_delay 4
vfs.zfs.resilver_delay 2
vfs.zfs.top_maxinflight 32
vfs.zfs.write_to_degraded 0
vfs.zfs.mg_alloc_failures 8
vfs.zfs.check_hostid 1
vfs.zfs.deadman_enabled 1
vfs.zfs.deadman_synctime 1000
vfs.zfs.recover 0
vfs.zfs.txg.synctime_ms 1000
vfs.zfs.txg.timeout 5
vfs.zfs.vdev.cache.bshift 16
vfs.zfs.vdev.cache.size 0
vfs.zfs.vdev.cache.max 16384
vfs.zfs.vdev.trim_on_init 1
vfs.zfs.vdev.write_gap_limit 4096
vfs.zfs.vdev.read_gap_limit 32768
vfs.zfs.vdev.aggregation_limit 131072
vfs.zfs.vdev.ramp_rate 2
vfs.zfs.vdev.time_shift 29
vfs.zfs.vdev.min_pending 4
vfs.zfs.vdev.max_pending 10
vfs.zfs.vdev.bio_delete_disable 0
vfs.zfs.vdev.bio_flush_disable 0
vfs.zfs.vdev.trim_max_pending 64
vfs.zfs.vdev.trim_max_bytes 2147483648
vfs.zfs.cache_flush_disable 0
vfs.zfs.zil_replay_disable 0
vfs.zfs.sync_pass_rewrite 2
vfs.zfs.sync_pass_dont_compress 5
vfs.zfs.sync_pass_deferred_free 2
vfs.zfs.zio.use_uma 0
vfs.zfs.snapshot_list_prefetch 0
vfs.zfs.version.ioctl 3
vfs.zfs.version.zpl 5
vfs.zfs.version.spa 5000
vfs.zfs.version.acl 1
vfs.zfs.debug 0
vfs.zfs.super_owner 0
vfs.zfs.trim.enabled 1
vfs.zfs.trim.max_interval 1
vfs.zfs.trim.timeout 30
vfs.zfs.trim.txg_delay 32
------------------------------------------------------------------------
-Nick
On Tue, Oct 7, 2014 at 9:15 PM, Chad Leigh Shire.Net LLC <chad at shire.net>
wrote:
> On Oct 7, 2014, at 7:48 PM, Nick Sivo <nick at ycombinator.com> wrote:
>> Hello,
>>
>>
>> I've been having trouble with ZFS on my server. For the most part it works splendidly, but occasionally I'll experience permanent hangs.
>>
>>
>> For example, right now on one of my ZFS filesystems (the others are fine), I can read, write, and stat files, but if I run ls in any directory, ls and the terminal will hang. CTRL-C, and kill -9 can't kill it:
>>
> How much free space do you have? (percentage wise). I found out (and have had others confirm) that when you get below a certain amount of free space you can get these symptoms (which percentage may vary per system and how the zfs config is set up [kernel parameters]).
> Also, depending on what you are doing the various parameters may need to be tweaked. Look in the archives for similar posts (including from me).
More information about the freebsd-questions
mailing list