kern/169480: [zfs] ZFS stalls on heavy I/O
Harry Coin
hgcoin at gmail.com
Sat Jan 26 03:10:01 UTC 2013
The following reply was made to PR kern/169480; it has been noted by GNATS.
From: Harry Coin <hgcoin at gmail.com>
To: bug-followup at FreeBSD.org, levent.serinol at mynet.com,
Jeremy Chadwick <jdc at koitsu.org>
Cc:
Subject: Re: kern/169480: [zfs] ZFS stalls on heavy I/O
Date: Fri, 25 Jan 2013 21:04:05 -0600
Jeremy, Thanks.
re: 1. Not only have I removed the zil but I removed the cache before
running this test as well, thinking as you did the whole l2arc thing was
adding too much complexity. The pool was nothing more than the single
raidz when doing the test to cause the failure noted.
re 2. The raw benchmarks you saw for ada2 were identically duplicated
for 3, 4 and 5. They are all exactly the same drive make and model
purchased at the same time.
re 3. These are 512 byte/sector drives. v6.0 smartctl -a reports 512
bytes/sector logical / physical.
re 4. The drives are partitioned so that each of them can hold boot
information should I decide someday to boot off the array, to make sure
sector 0 of the partition starts on the afore-noticed 4K boundary, and
there is a couple of gig at the end for swap if I decide I want that.
Partition 1 is just big enough for the boot code, 2 is the bulk of the
storage, and 3 is the last 2 gig.
So, no zil or cache used in the test, all formatted zfs v28, scrub
reported zero errors on the array two days ago (though the scrub speed
was 14M/s or so).
It's something about max speed writes that get interrupted by another
read that kills zfs/nas4free somehow.
re 5: The ata drives indeed are on the ICH7, as the 300gb isn't an
issue. The ssd's are on the higher speed controller, and are unused as
noted before.
re 6: You'll note that 'ls -l' didn't return, only the 'top' program
run before the dd was started was launched. Here are the procstat -kk's
for the dd and ls during their run (the ls not returning)
nas4free:~# ps axu | grep dd
root 4006 47.1 0.1 9916 1752 v0 R+ 2:56AM 0:07.92 dd
if=/dev/urandom of=foo bs=512 count=20000000
root 4009 0.0 0.1 16280 2092 0 S+ 2:56AM 0:00.00 grep dd
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137
dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154
dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2
vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137
dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154
dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2
vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137
dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154
dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2
vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_delay+0x137
dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154
dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2
vn_write+0x38c dofilewrite+0x8b kern_writev+0x6c sys_write+0x64
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# ps axu | grep ls
root 4016 0.0 0.1 14420 2196 v2 D+ 2:57AM 0:00.00 ls -l
root 4018 0.0 0.1 16280 2084 0 RL+ 2:57AM 0:00.00 grep ls
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85
zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c
dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546
Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85
zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c
dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546
Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85
zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c
dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546
Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85
zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c
dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546
Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85
zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c
dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546
Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85
zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c
dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546
Xfast_syscall+0xf7
nas4free:~# procstat -kk 4006
PID TID COMM TDNAME KSTACK
4006 100114 dd - mi_switch+0x186
sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85
zfs_freebsd_write+0x47b VOP_WRITE_APV+0xb2 vn_write+0x38c
dofilewrite+0x8b kern_writev+0x6c sys_write+0x64 amd64_syscall+0x546
Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~# procstat -kk 4016
PID TID COMM TDNAME KSTACK
4016 100085 ls - mi_switch+0x186
sleepq_wait+0x42 __lockmgr_args+0xe40 vop_stdlock+0x39
VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup_times+0x55a
vfs_cache_lookup+0xc8 VOP_LOOKUP_APV+0x40 lookup+0x464 namei+0x4e9
kern_statat_vnhook+0xb3 kern_statat+0x15 sys_lstat+0x2a
amd64_syscall+0x546 Xfast_syscall+0xf7
nas4free:~#
re 7: The 'submit a problem' form doesn't list 'fs' as an option, just
kern. If there's another place for fs bugs I trust the freebsd folks
will know what to do with it. Anyhow I do think it's about write
completions getting lost. And, 'nas4free' is a pretty popular project
having the major virtue that anyone, anywhere looking to solve this
problem can download a livecd iso and reproduce it. This is not my
first rodeo, there are several bug reports with various black magic
tunable tweaks that generally I think accomplish avoiding the bug only
by their usage patterns and the tweaks not hitting it.
This particular iso freebsd download has the virtue of being completely
reproducible.
Does any of the above help? Also here's an update zpool :
nas4free:~# zpool status pool1
pool: pool1
state: ONLINE
scan: scrub canceled on Fri Jan 25 22:07:51 2013
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada2p2 ONLINE 0 0 0
ada3p2 ONLINE 0 0 0
ada4p2 ONLINE 0 0 0
ada5p2 ONLINE 0 0 0
errors: No known data errors
nas4free:~#
as4free:~# zpool iostat -v pool1
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
pool1 3.01T 2.43T 168 0 360K 3.42K
raidz1 3.01T 2.43T 168 0 360K 3.42K
ada2p2 - - 124 0 104K 2.34K
ada3p2 - - 123 0 104K 2.28K
ada4p2 - - 124 0 103K 2.34K
ada5p2 - - 126 0 98.1K 2.29K
---------- ----- ----- ----- ----- ----- -----
nas4free:~#
nas4free:~# zpool get all pool1
NAME PROPERTY VALUE SOURCE
pool1 size 5.44T -
pool1 capacity 55% -
pool1 altroot - default
pool1 health ONLINE -
pool1 guid 1701438519865110975 default
pool1 version 28 default
pool1 bootfs - default
pool1 delegation on default
pool1 autoreplace off default
pool1 cachefile - default
pool1 failmode wait default
pool1 listsnapshots off default
pool1 autoexpand off default
pool1 dedupditto 0 default
pool1 dedupratio 1.46x -
pool1 free 2.43T -
pool1 allocated 3.01T -
pool1 readonly off -
pool1 comment - default
pool1 expandsize 0 -
nas4free:~#
nas4free:~# zfs get all pool1/videos
NAME PROPERTY VALUE SOURCE
pool1/videos type filesystem -
pool1/videos creation Sat Jan 19 5:12 2013 -
pool1/videos used 526G -
pool1/videos available 1.74T -
pool1/videos referenced 526G -
pool1/videos compressratio 1.00x -
pool1/videos mounted yes -
pool1/videos quota none local
pool1/videos reservation none local
pool1/videos recordsize 128K default
pool1/videos mountpoint /mnt/pool1/videos inherited
from pool1
pool1/videos sharenfs off default
pool1/videos checksum on default
pool1/videos compression off local
pool1/videos atime off local
pool1/videos devices on default
pool1/videos exec on default
pool1/videos setuid on default
pool1/videos readonly off local
pool1/videos jailed off default
pool1/videos snapdir hidden local
pool1/videos aclmode discard default
pool1/videos aclinherit restricted default
pool1/videos canmount on local
pool1/videos xattr off temporary
pool1/videos copies 1 default
pool1/videos version 5 -
pool1/videos utf8only off -
pool1/videos normalization none -
pool1/videos casesensitivity sensitive -
pool1/videos vscan off default
pool1/videos nbmand off default
pool1/videos sharesmb off default
pool1/videos refquota none default
pool1/videos refreservation none default
pool1/videos primarycache all default
pool1/videos secondarycache all default
pool1/videos usedbysnapshots 0 -
pool1/videos usedbydataset 526G -
pool1/videos usedbychildren 0 -
pool1/videos usedbyrefreservation 0 -
pool1/videos logbias latency default
pool1/videos dedup off local
pool1/videos mlslabel -
pool1/videos sync standard local
pool1/videos refcompressratio 1.00x -
pool1/videos written 526G -
nas4free:~#
More information about the freebsd-fs
mailing list