kern/169480: [zfs] ZFS stalls on heavy I/O

Harry Coin hgcoin at gmail.com
Sat Jan 26 03:10:01 UTC 2013


The following reply was made to PR kern/169480; it has been noted by GNATS.

From: Harry Coin <hgcoin at gmail.com>
To: bug-followup at FreeBSD.org, levent.serinol at mynet.com, 
 Jeremy Chadwick <jdc at koitsu.org>
Cc:  
Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O
Date: Fri, 25 Jan 2013 21:04:05 -0600

 Jeremy, Thanks.
 
 re: 1.  Not only have I removed the zil but I removed the cache before 
 running this test as well, thinking as you did the whole l2arc thing was 
 adding too much complexity.   The pool was nothing more than the single 
 raidz when doing the test to cause the failure noted.
 
 re 2.  The raw benchmarks you saw for ada2 were identically duplicated 
 for 3, 4 and 5.  They are all exactly the same drive make and model 
 purchased at the same time.
 
 re 3.  These are 512 byte/sector drives.  v6.0 smartctl -a reports 512 
 bytes/sector logical / physical.
 
 re 4.  The drives are partitioned so that each of them can hold boot 
 information should I decide someday to boot off the array, to make sure 
 sector 0 of the partition starts on the afore-noticed 4K boundary,  and 
 there is a couple of gig at the end for swap if I decide I want that.  
 Partition 1 is just big enough for the boot code, 2 is the bulk of the 
 storage, and 3 is the last 2 gig.
 
 So, no zil or cache used in the test, all formatted zfs v28, scrub 
 reported zero errors on the array two days ago (though the scrub speed 
 was 14M/s or so).
 
 It's something about max speed writes that get interrupted by another 
 read that kills zfs/nas4free somehow.
 
 re 5:  The ata drives indeed are on the ICH7, as the 300gb isn't an 
 issue.  The ssd's are on the higher speed controller, and are unused as 
 noted before.
 
 re 6:  You'll note that 'ls -l' didn't return, only the 'top' program 
 run before the dd was started was launched.  Here are the procstat -kk's 
 for the dd and ls during their run (the ls not returning)
 nas4free:~# ps axu | grep dd
 root  4006  47.1  0.1  9916 1752 v0  R+    2:56AM   0:07.92 dd 
 if=/dev/urandom of=foo bs=512 count=20000000
 root  4009   0.0  0.1 16280 2092  0  S+    2:56AM   0:00.00 grep dd
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137 
 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 
 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 
 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137 
 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 
 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 
 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137 
 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 
 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 
 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137 
 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 
 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 
 vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# ps axu | grep ls
 root  4016   0.0  0.1 14420 2196 v2  D+    2:57AM   0:00.00 ls -l
 root  4018   0.0  0.1 16280 2084  0  RL+   2:57AM   0:00.00 grep ls
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 
 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c 
 dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 
 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 
 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c 
 dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 
 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 
 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c 
 dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 
 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 
 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c 
 dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 
 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 
 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c 
 dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 
 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 
 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c 
 dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 
 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4006
    PID    TID COMM             TDNAME           KSTACK
   4006 100114 dd               -                mi_switch+0x186 
 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 
 zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c 
 dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546 
 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~# procstat -kk 4016
    PID    TID COMM             TDNAME           KSTACK
   4016 100085 ls               -                mi_switch+0x186 
 sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39 
 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a 
 vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9 
 kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a 
 amd64_syscall+0x546 Xfast_syscall+0xf7
 nas4free:~#
 
 
 re 7:  The 'submit a problem' form doesn't list 'fs' as an option, just 
 kern.  If there's another place for fs bugs I trust the freebsd folks 
 will know what to do with it.  Anyhow I do think it's about write 
 completions getting lost.   And, 'nas4free' is a pretty popular project 
 having the major virtue that anyone, anywhere looking to solve this 
 problem can download a livecd iso and reproduce it.  This is not my 
 first rodeo, there are several bug reports with various black magic 
 tunable tweaks that generally I think accomplish avoiding the bug only 
 by their usage patterns and the tweaks not hitting it.
 
 This particular iso freebsd download has the virtue of being completely 
 reproducible.
 
 Does any of the above help?   Also here's an update zpool :
 
 nas4free:~# zpool status pool1
    pool: pool1
   state: ONLINE
    scan: scrub canceled on Fri Jan 25 22:07:51 2013
 config:
 
      NAME        STATE     READ WRITE CKSUM
      pool1       ONLINE       0     0     0
        raidz1-0  ONLINE       0     0     0
          ada2p2  ONLINE       0     0     0
          ada3p2  ONLINE       0     0     0
          ada4p2  ONLINE       0     0     0
          ada5p2  ONLINE       0     0     0
 
 errors: No known data errors
 nas4free:~#
 
 as4free:~# zpool iostat -v pool1
                 capacity     operations    bandwidth
 pool        alloc   free   read  write   read  write
 ----------  -----  -----  -----  -----  -----  -----
 pool1       3.01T  2.43T    168      0   360K  3.42K
    raidz1    3.01T  2.43T    168      0   360K  3.42K
      ada2p2      -      -    124      0   104K  2.34K
      ada3p2      -      -    123      0   104K  2.28K
      ada4p2      -      -    124      0   103K  2.34K
      ada5p2      -      -    126      0  98.1K  2.29K
 ----------  -----  -----  -----  -----  -----  -----
 
 nas4free:~#
 
 nas4free:~# zpool get all pool1
 NAME   PROPERTY       VALUE       SOURCE
 pool1  size           5.44T       -
 pool1  capacity       55%         -
 pool1  altroot        -           default
 pool1  health         ONLINE      -
 pool1  guid           1701438519865110975  default
 pool1  version        28          default
 pool1  bootfs         -           default
 pool1  delegation     on          default
 pool1  autoreplace    off         default
 pool1  cachefile      -           default
 pool1  failmode       wait        default
 pool1  listsnapshots  off         default
 pool1  autoexpand     off         default
 pool1  dedupditto     0           default
 pool1  dedupratio     1.46x       -
 pool1  free           2.43T       -
 pool1  allocated      3.01T       -
 pool1  readonly       off         -
 pool1  comment        -           default
 pool1  expandsize     0           -
 nas4free:~#
 nas4free:~# zfs get all pool1/videos
 NAME          PROPERTY              VALUE                  SOURCE
 pool1/videos  type                  filesystem             -
 pool1/videos  creation              Sat Jan 19  5:12 2013  -
 pool1/videos  used                  526G                   -
 pool1/videos  available             1.74T                  -
 pool1/videos  referenced            526G                   -
 pool1/videos  compressratio         1.00x                  -
 pool1/videos  mounted               yes                    -
 pool1/videos  quota                 none                   local
 pool1/videos  reservation           none                   local
 pool1/videos  recordsize            128K                   default
 pool1/videos  mountpoint            /mnt/pool1/videos      inherited 
 from pool1
 pool1/videos  sharenfs              off                    default
 pool1/videos  checksum              on                     default
 pool1/videos  compression           off                    local
 pool1/videos  atime                 off                    local
 pool1/videos  devices               on                     default
 pool1/videos  exec                  on                     default
 pool1/videos  setuid                on                     default
 pool1/videos  readonly              off                    local
 pool1/videos  jailed                off                    default
 pool1/videos  snapdir               hidden                 local
 pool1/videos  aclmode               discard                default
 pool1/videos  aclinherit            restricted             default
 pool1/videos  canmount              on                     local
 pool1/videos  xattr                 off                    temporary
 pool1/videos  copies                1                      default
 pool1/videos  version               5                      -
 pool1/videos  utf8only              off                    -
 pool1/videos  normalization         none                   -
 pool1/videos  casesensitivity       sensitive              -
 pool1/videos  vscan                 off                    default
 pool1/videos  nbmand                off                    default
 pool1/videos  sharesmb              off                    default
 pool1/videos  refquota              none                   default
 pool1/videos  refreservation        none                   default
 pool1/videos  primarycache          all                    default
 pool1/videos  secondarycache        all                    default
 pool1/videos  usedbysnapshots       0                      -
 pool1/videos  usedbydataset         526G                   -
 pool1/videos  usedbychildren        0                      -
 pool1/videos  usedbyrefreservation  0                      -
 pool1/videos  logbias               latency                default
 pool1/videos  dedup                 off                    local
 pool1/videos  mlslabel                                     -
 pool1/videos  sync                  standard               local
 pool1/videos  refcompressratio      1.00x                  -
 pool1/videos  written               526G                   -
 nas4free:~#
 
 
 
 
 
 
 
 
 
 
 
 


More information about the freebsd-fs mailing list