Poor ZFS+NFSv3 read/write performance and panic
David Adam
zanchey at ucc.gu.uwa.edu.au
Fri Jan 29 14:35:17 UTC 2016
Hi all,
We have a FreeBSD 10.2 server sharing some ZFS datasets over NFSv3. It's
worked well until recently, but has started to routinely perform
exceptionally poorly, eventually panicing in vdev_deadman() (which I
understand is a feature).
Initally after booting, things are fine, but performance rapidly begins to
degrade. Both read and write performance is terrible, with many operations
either hanging indefinitely or timing out.
When this happens, I can break into DDB and see lots of nfsd process stuck
waiting for a lock:
Process 784 (nfsd) thread 0xfffff80234795000 (100455)
shared lockmgr zfs (zfs) r = 0 (0xfffff8000b91f548) locked @
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:2196
and the backtrace looks like this:
sched_switch() at sched_switch+0x495/frame 0xfffffe04677740b0
mi_switch() at mi_switch+0x179/frame 0xfffffe04677740f0
turnstile_wait() at turnstile_wait+0x3b2/frame 0xfffffe0467774140
__mtx_lock_sleep() at __mtx_lock_sleep+0x2c0/frame 0xfffffe04677741c0
__mtx_lock_flags() at __mtx_lock_flags+0x102/frame 0xfffffe0467774210
vmem_size() at vmem_size+0x5a/frame 0xfffffe0467774240
arc_reclaim_needed() at arc_reclaim_needed+0xd2/frame 0xfffffe0467774260
arc_get_data_buf() at arc_get_data_buf+0x157/frame 0xfffffe04677742a0
arc_read() at arc_read+0x68b/frame 0xfffffe0467774350
dbuf_read() at dbuf_read+0x7ed/frame 0xfffffe04677743f0
dmu_tx_check_ioerr() at dmu_tx_check_ioerr+0x8b/frame 0xfffffe0467774420
dmu_tx_count_write() at dmu_tx_count_write+0x17e/frame 0xfffffe0467774540
dmu_tx_hold_write() at dmu_tx_hold_write+0xba/frame 0xfffffe0467774580
zfs_freebsd_write() at zfs_freebsd_write+0x55d/frame 0xfffffe04677747b0
VOP_WRITE_APV() at VOP_WRITE_APV+0x193/frame 0xfffffe04677748c0
nfsvno_write() at nfsvno_write+0x13e/frame 0xfffffe0467774970
nfsrvd_write() at nfsrvd_write+0x496/frame 0xfffffe0467774c80
nfsrvd_dorpc() at nfsrvd_dorpc+0x66b/frame 0xfffffe0467774e40
nfssvc_program() at nfssvc_program+0x4e6/frame 0xfffffe0467774ff0
svc_run_internal() at svc_run_internal+0xbb7/frame 0xfffffe0467775180
svc_run() at svc_run+0x1db/frame 0xfffffe04677751f0
nfsrvd_nfsd() at nfsrvd_nfsd+0x1f0/frame 0xfffffe0467775350
nfssvc_nfsd() at nfssvc_nfsd+0x124/frame 0xfffffe0467775970
sys_nfssvc() at sys_nfssvc+0xb7/frame 0xfffffe04677759a0
amd64_syscall() at amd64_syscall+0x278/frame 0xfffffe0467775ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0467775ab0
Is this likely to be due to bad hardware? I can't see any problems in
the SMART data, and `camcontrol tags da0 -v` etc. does not reveal any
particularly long queues. Are there other useful things to check?
If not, do you have any other ideas? I can make the full DDB information
available if that would be helpful.
The pool is configured thus:
NAME STATE READ WRITE CKSUM
space ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
da4 ONLINE 0 0 0
da6 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
da7 ONLINE 0 0 0
da8 ONLINE 0 0 0
logs
mirror-4 ONLINE 0 0 0
gpt/molmol-slog ONLINE 0 0 0
gpt/molmol-slog0 ONLINE 0 0 0
where the da? devices are WD Reds and the SLOG partitions are on Samsung
840s.
Many thanks,
David Adam
zanchey at ucc.gu.uwa.edu.au
More information about the freebsd-fs
mailing list