ZFS Write Lockup

Dave Cundiff syshackmin at gmail.com
Tue Oct 4 21:45:38 UTC 2011


Hi,

Decided to cross post this over here as well since it seems like it
could be something actually wrong and not me being an idiot. Feel free
to let me know if I'm an idiot. :)

gstat is showing almost no IO hitting da1 which is my zpool(21 disk
raid50 3x7). Yet the zvols are all backed up.

    0    272      7     71   11.6    264   4439    0.2    4.8| da1
    0      4      2     26    6.7      2     23  557.0  126.7| zvol/san/a2s61
    0     14      0      0    0.0     14     86   91.5  132.5| zvol/san/a2s66
    1     14      0      1   13.8     14    154  100.6  140.2| zvol/san/solsemi1
    1     19      1      5    0.1     18    156   76.6  139.8| zvol/san/solman1
    1      6      1     26    8.4      5    112  275.3  140.9| zvol/san/a2s62
    1     16      1      5    9.1     16    317   88.1  139.7| zvol/san/solman2
    1     29      1      2    6.6     29    214   48.8  139.8| zvol/san/solsemi2
    1      7      1      2    8.5      6     50  232.5  140.4| zvol/san/solman4

I've tweaked only a few settings from default.

[root at san2 ~]# cat /boot/loader.conf
console="comconsole,vidconsole"
comconsole_speed="115200"
vm.kmem_size="30G"
vfs.zfs.arc_max="22G"
kern.hz=100
loader_logo="beastie"

[root at san2 ~]# cat /etc/sysctl.conf
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=131072

[root at san2 ~]# zpool status
  pool: san
 state: ONLINE
 scrub: none requested
config:

	NAME         STATE     READ WRITE CKSUM
	san          ONLINE       0     0     0
	  da1        ONLINE       0     0     0
	logs
	  mirror     ONLINE       0     0     0
	    ad6s1b   ONLINE       0     0     0
	    ad14s1b  ONLINE       0     0     0
	cache
	  ad6s1d     ONLINE       0     0     0
	  ad14s1d    ONLINE       0     0     0

errors: No known data errors


All my volumes are the same. I don't manually adjust any properties.

[root at san2 ~]# zfs get all san
NAME  PROPERTY              VALUE                  SOURCE
san   type                  filesystem             -
san   creation              Tue Feb  8  9:58 2011  -
san   used                  9.10T                  -
san   available             3.33T                  -
san   referenced            221M                   -
san   compressratio         1.00x                  -
san   mounted               yes                    -
san   quota                 none                   default
san   reservation           none                   default
san   recordsize            128K                   default
san   mountpoint            /san                   default
san   sharenfs              off                    default
san   checksum              off                    local
san   compression           off                    default
san   atime                 on                     default
san   devices               on                     default
san   exec                  on                     default
san   setuid                on                     default
san   readonly              off                    default
san   jailed                off                    default
san   snapdir               hidden                 default
san   aclmode               groupmask              default
san   aclinherit            restricted             default
san   canmount              on                     default
san   shareiscsi            off                    default
san   xattr                 off                    temporary
san   copies                1                      default
san   version               4                      -
san   utf8only              off                    -
san   normalization         none                   -
san   casesensitivity       sensitive              -
san   vscan                 off                    default
san   nbmand                off                    default
san   sharesmb              off                    default
san   refquota              none                   default
san   refreservation        none                   default
san   primarycache          all                    default
san   secondarycache        all                    default
san   usedbysnapshots       0                      -
san   usedbydataset         221M                   -
san   usedbychildren        9.10T                  -
san   usedbyrefreservation  0                      -
[root at san2 ~]# zfs get all san/a2s66
NAME       PROPERTY              VALUE                  SOURCE
san/a2s66  type                  volume                 -
san/a2s66  creation              Wed Sep 21 16:25 2011  -
san/a2s66  used                  770G                   -
san/a2s66  available             3.33T                  -
san/a2s66  referenced            753G                   -
san/a2s66  compressratio         1.00x                  -
san/a2s66  reservation           none                   default
san/a2s66  volsize               750G                   -
san/a2s66  volblocksize          4K                     -
san/a2s66  checksum              off                    inherited from san
san/a2s66  compression           off                    default
san/a2s66  readonly              off                    default
san/a2s66  shareiscsi            off                    default
san/a2s66  copies                1                      default
san/a2s66  refreservation        none                   default
san/a2s66  primarycache          all                    default
san/a2s66  secondarycache        all                    default
san/a2s66  usedbysnapshots       17.3G                  -
san/a2s66  usedbydataset         753G                   -
san/a2s66  usedbychildren        0                      -
san/a2s66  usedbyrefreservation  0                      -

last pid: 60292;  load averages:  0.96,  0.67,  0.80
    up 9+17:31:59  17:41:52
63 processes:  2 running, 61 sleeping
CPU:  1.3% user,  0.0% nice, 46.4% system,  1.1% interrupt, 51.2% idle
Mem: 37M Active, 32M Inact, 22G Wired, 15M Cache, 1940M Buf, 1075M Free
Swap: 28G Total, 13M Used, 28G Free

[root at san2 ~]# sysctl -a | grep zfs
vfs.zfs.l2c_only_size: 81245392896
vfs.zfs.mfu_ghost_data_lsize: 51142656
vfs.zfs.mfu_ghost_metadata_lsize: 10687021568
vfs.zfs.mfu_ghost_size: 10738164224
vfs.zfs.mfu_data_lsize: 757547008
vfs.zfs.mfu_metadata_lsize: 954693120
vfs.zfs.mfu_size: 2612401664
vfs.zfs.mru_ghost_data_lsize: 1983434752
vfs.zfs.mru_ghost_metadata_lsize: 3657913344
vfs.zfs.mru_ghost_size: 5641348096
vfs.zfs.mru_data_lsize: 9817952768
vfs.zfs.mru_metadata_lsize: 395397632
vfs.zfs.mru_size: 10833757184
vfs.zfs.anon_data_lsize: 0
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_size: 34037760
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 0
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.arc_meta_limit: 5905580032
vfs.zfs.arc_meta_used: 5906093432
vfs.zfs.mdcomp_disable: 0
vfs.zfs.arc_min: 2952790016
vfs.zfs.arc_max: 23622320128
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 1
vfs.zfs.check_hostid: 1
vfs.zfs.recover: 0
vfs.zfs.txg.write_limit_override: 0
vfs.zfs.txg.synctime: 5
vfs.zfs.txg.timeout: 30
vfs.zfs.scrub_limit: 10
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 10485760
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.ramp_rate: 2
vfs.zfs.vdev.time_shift: 6
vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 10
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_disable: 0
vfs.zfs.zio.use_uma: 0
vfs.zfs.version.zpl: 4
vfs.zfs.version.spa: 15
vfs.zfs.version.dmu_backup_stream: 1
vfs.zfs.version.dmu_backup_header: 2
vfs.zfs.version.acl: 1
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
kstat.zfs.misc.zfetchstats.hits: 1622932834
kstat.zfs.misc.zfetchstats.misses: 300700562
kstat.zfs.misc.zfetchstats.colinear_hits: 144156
kstat.zfs.misc.zfetchstats.colinear_misses: 300556406
kstat.zfs.misc.zfetchstats.stride_hits: 1138458507
kstat.zfs.misc.zfetchstats.stride_misses: 386271
kstat.zfs.misc.zfetchstats.reclaim_successes: 5313527
kstat.zfs.misc.zfetchstats.reclaim_failures: 295242879
kstat.zfs.misc.zfetchstats.streams_resets: 141691
kstat.zfs.misc.zfetchstats.streams_noresets: 484474231
kstat.zfs.misc.zfetchstats.bogus_streams: 0
kstat.zfs.misc.arcstats.hits: 2877951340
kstat.zfs.misc.arcstats.misses: 677132553
kstat.zfs.misc.arcstats.demand_data_hits: 1090801028
kstat.zfs.misc.arcstats.demand_data_misses: 142078773
kstat.zfs.misc.arcstats.demand_metadata_hits: 760631826
kstat.zfs.misc.arcstats.demand_metadata_misses: 15429069
kstat.zfs.misc.arcstats.prefetch_data_hits: 77566631
kstat.zfs.misc.arcstats.prefetch_data_misses: 412415335
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 948951855
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 107209376
kstat.zfs.misc.arcstats.mru_hits: 762645834
kstat.zfs.misc.arcstats.mru_ghost_hits: 25063620
kstat.zfs.misc.arcstats.mfu_hits: 1792233827
kstat.zfs.misc.arcstats.mfu_ghost_hits: 107949685
kstat.zfs.misc.arcstats.allocated: 1220368604
kstat.zfs.misc.arcstats.deleted: 881708637
kstat.zfs.misc.arcstats.stolen: 324197286
kstat.zfs.misc.arcstats.recycle_miss: 393866103
kstat.zfs.misc.arcstats.mutex_miss: 47835019
kstat.zfs.misc.arcstats.evict_skip: 16800403516
kstat.zfs.misc.arcstats.evict_l2_cached: 3404346428416
kstat.zfs.misc.arcstats.evict_l2_eligible: 906780261888
kstat.zfs.misc.arcstats.evict_l2_ineligible: 1712274098176
kstat.zfs.misc.arcstats.hash_elements: 12932367
kstat.zfs.misc.arcstats.hash_elements_max: 26675689
kstat.zfs.misc.arcstats.hash_collisions: 1195027725
kstat.zfs.misc.arcstats.hash_chains: 524288
kstat.zfs.misc.arcstats.hash_chain_max: 114
kstat.zfs.misc.arcstats.p: 13900189206
kstat.zfs.misc.arcstats.c: 16514222070
kstat.zfs.misc.arcstats.c_min: 2952790016
kstat.zfs.misc.arcstats.c_max: 23622320128
kstat.zfs.misc.arcstats.size: 16514197960
kstat.zfs.misc.arcstats.hdr_size: 698646840
kstat.zfs.misc.arcstats.data_size: 13480204800
kstat.zfs.misc.arcstats.other_size: 222586112
kstat.zfs.misc.arcstats.l2_hits: 236859220
kstat.zfs.misc.arcstats.l2_misses: 440273314
kstat.zfs.misc.arcstats.l2_feeds: 998879
kstat.zfs.misc.arcstats.l2_rw_clash: 41492
kstat.zfs.misc.arcstats.l2_read_bytes: 1523423294976
kstat.zfs.misc.arcstats.l2_write_bytes: 2108729975808
kstat.zfs.misc.arcstats.l2_writes_sent: 908755
kstat.zfs.misc.arcstats.l2_writes_done: 908755
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 125029
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 78155
kstat.zfs.misc.arcstats.l2_evict_reading: 52
kstat.zfs.misc.arcstats.l2_free_on_write: 735076
kstat.zfs.misc.arcstats.l2_abort_lowmem: 2368
kstat.zfs.misc.arcstats.l2_cksum_bad: 9
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 88680833024
kstat.zfs.misc.arcstats.l2_hdr_size: 2275280224
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 160181805
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 48073379
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
kstat.zfs.misc.arcstats.l2_write_in_l2: 101326826532
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 3016312
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 16631379447
kstat.zfs.misc.arcstats.l2_write_full: 158541
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 998879
kstat.zfs.misc.arcstats.l2_write_pios: 908755
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 881025143301120
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 62580701
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 2040782
kstat.zfs.misc.vdev_cache_stats.delegations: 2167916
kstat.zfs.misc.vdev_cache_stats.hits: 2801310
kstat.zfs.misc.vdev_cache_stats.misses: 5448597

On Tue, Oct 4, 2011 at 2:43 AM, Dave Cundiff <syshackmin at gmail.com> wrote:
> Hi,
>
> I'm running 8.2-RELEASE and running into an IO lockup on ZFS that is
> happening pretty regularly. The system is stock except for the
> following set in loader.conf
>
> vm.kmem_size="30G"
> vfs.zfs.arc_max="22G"
> kern.hz=100
>
> I know the kmem settings aren't SUPPOSED to be necessary now, buy my
> ZFS boxes were crashing until I added them. The machine has 24 gigs of
> RAM. The kern.hz=100 was to stretch out the l2arc bug that pops up at
> 28days with it set to 1000.
>
> [root at san2 ~]# zpool status
>  pool: san
>  state: ONLINE
>  scrub: none requested
> config:
>
>        NAME         STATE     READ WRITE CKSUM
>        san          ONLINE       0     0     0
>          da1        ONLINE       0     0     0
>        logs
>          mirror     ONLINE       0     0     0
>            ad6s1b   ONLINE       0     0     0
>            ad14s1b  ONLINE       0     0     0
>        cache
>          ad6s1d     ONLINE       0     0     0
>          ad14s1d    ONLINE       0     0     0
>
> errors: No known data errors
>
>
> Here's a zpool iostat from a machine in trouble.
>
> san         9.08T  3.55T      0      0      0  7.92K
> san         9.08T  3.55T      0    447      0  5.77M
> san         9.08T  3.55T      0    309      0  2.83M
> san         9.08T  3.55T      0      0      0      0
> san         9.08T  3.55T     62      0  2.22M      0
> san         9.08T  3.55T      0      2      0  23.5K
> san         9.08T  3.55T      0      0      0      0
> san         9.08T  3.55T      0      0      0      0
> san         9.08T  3.55T      0    254      0  6.62M
> san         9.08T  3.55T      0    249      0  3.16M
> san         9.08T  3.55T      0      0      0      0
> san         9.08T  3.55T     34      0   491K      0
> san         9.08T  3.55T      0      6      0  62.7K
> san         9.08T  3.55T      0      0      0      0
> san         9.08T  3.55T      0     85      0  6.59M
> san         9.08T  3.55T      0      0      0      0
> san         9.08T  3.55T      0    452      0  4.88M
> san         9.08T  3.55T    109      0  3.12M      0
> san         9.08T  3.55T      0      0      0      0
> san         9.08T  3.55T      0      0      0  7.84K
> san         9.08T  3.55T      0    434      0  6.41M
> san         9.08T  3.55T      0      0      0      0
> san         9.08T  3.55T      0    304      0  2.90M
> san         9.08T  3.55T     37      0   628K      0
>
> Its supposed to look like
>
> san         9.07T  3.56T    162    167  3.75M  6.09M
> san         9.07T  3.56T      5      0  47.4K      0
> san         9.07T  3.56T     19      0   213K      0
> san         9.07T  3.56T    120      0  3.26M      0
> san         9.07T  3.56T     92      0   741K      0
> san         9.07T  3.56T    114      0  2.86M      0
> san         9.07T  3.56T     72      0   579K      0
> san         9.07T  3.56T     14      0   118K      0
> san         9.07T  3.56T     24      0   213K      0
> san         9.07T  3.56T     25      0   324K      0
> san         9.07T  3.56T      8      0   126K      0
> san         9.07T  3.56T     28      0   505K      0
> san         9.07T  3.56T     15      0   126K      0
> san         9.07T  3.56T     11      0   158K      0
> san         9.07T  3.56T     19      0   356K      0
> san         9.07T  3.56T    198      0  3.55M      0
> san         9.07T  3.56T     21      0   173K      0
> san         9.07T  3.56T     18      0   150K      0
> san         9.07T  3.56T     23      0   260K      0
> san         9.07T  3.56T      9      0  78.3K      0
> san         9.07T  3.56T     21      0   173K      0
> san         9.07T  3.56T      2  4.59K  16.8K   142M
> san         9.07T  3.56T     12      0   103K      0
> san         9.07T  3.56T     26    454   312K  4.35M
> san         9.07T  3.56T    111      0  3.34M      0
> san         9.07T  3.56T     28      0   870K      0
> san         9.07T  3.56T     75      0  3.88M      0
> san         9.07T  3.56T     43      0  1.22M      0
> san         9.07T  3.56T     26      0   270K      0
>
> I don't know what triggers the problem but I know how to fix it. If I
> perform a couple snapshot deletes the IO will come back in line every
> single time. Fortunately I have LOTS of snapshots to delete.
>
> [root at san2 ~]# zfs list -r -t snapshot | wc -l
>    5236
> [root at san2 ~]# zfs list -r -t volume | wc -l
>      17
>
> Being fairly new to FreeBSD and ZFS I'm pretty clueless on where to
> begin tracking this down. I've been staring at gstat trying to see if
> a zvol is getting a big burst of writes that may be flooding the drive
> controller but I haven't caught anything yet. top -S -H shows
> zio_write_issue threads consuming massive amounts of CPU during the
> lockup. Normally they sit around 5-10%.  Any suggestions on where I
> could start to track this down would be greatly appreciated.
>


-- 
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com


More information about the freebsd-questions mailing list